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INTRODUCTION 

The  measurement  and  use  of  visual  motion  is  one  of  the  most  fundamental  abilities  of  bio¬ 
logical  vision  systems,  serving  many  essential  functions.  For  example,  a  sudden  movement 
in  the  scene  might  indicate  an  approaching  predator  or  a  desirable  prey.  The  rapid  expan¬ 
sion  of  features  in  the  visual  field  can  signal  an  object  about  to  collide  with  the  observer. 
Discontinuities  in  motion  often  occur  at  the  locations  of  object  boundaries  and  can  be  used 
to  carve  up  the  scene  into  distinct  objects.  Motion  signals  provide  input  to  centers  control¬ 
ling  eye  movements,  allowing  objects  of  interest  to  be  tracked  through  the  scene.  Relative 
movement  can  be  used  to  infer  the  three-dimensional  (3-D)  structure  and  motion  of  object 
surfaces,  and  the  movement  of  the  observer  relative  to  the  scene,  allowing  biological  systems 
to  navigate  quickly  and  efficiently  through  the  environment.  More  generally,  the  analysis  of 
visual  motion  helps  us  to  maintain  continuity  of  our  perception  of  the  constantly  changing 
environment  around  us. 

This  article  reviews  our  current  understanding  of  a  number  of  aspects  of  visual  motion 
analysis  in  biological  systems,  from  a  computational  perspective.  We  illustrate  the  kinds 
of  insights  that  have  been  gained  through  computational  studies  and  how  they  can  be 
integrated  with  experimental  studies  from  psychology  and  the  neurosciences,  to  understand 
the  particular  computations  used  by  biological  systems  to  analyze  motion.  In  the  remainder 
of  this  introduction,  we  briefly  describe  the  computational  approach  to  the  study  of  vision 
and  discuss  the  areas  of  motion  analysis  that  are  addressed  in  this  review. 

The  Computational  Study  of  Vision 

One  of  the  most  important  tenets  underlying  a  computational  approach  to  the  study  of 
biological  vision  is  the  belief  that  the  brain,  like  a  computer,  can  be  thought  of  as  a  machine 
that  processes  information  extracted  from  the  environment,  resulting  in  some  sort  of  action. 

Like  Aristotle,  Galen,  and  Descartes  before  us,  we  often  think  of  the  brain  in  terms  of  our 
most  successful  machines,  which  today  happen  to  be  digital  computers.  We  must  be  careful 
in  making  such  an  analogy,  however.  The  electrochemical  environment  of  neurons,  their 
means  of  transmitting  information,  and  their  overall  architecture  is  very  different  from  that 
of  the  wires  and  etched  crystals  of  semiconducting  material  that  comprise  computers.  The 
Turing  machine,  a  core  concept  of  computer  science,  works  in  a  discrete  mode  in  a  world  — 

determined  by  classical  physics.  Such  a  machine  can  only  approximate  the  truly  analog  — — 

operations  of  biological  hardware  in  a  world  governed  by  the  laws  of  quantum  physics. 

Although  their  hardware  differs  greatly,  both  biological  systems  and  machines  can 
perform  similar  functions  that  rely  on  the  same  mathematical  and  physical  principles.  Thus, 
there  exists  a  level  of  description  of  the  tasks  performed  by  these  two  systems  that  is  ~— 
independent  of  the  underlying  hardware.  In  order  to  understand  how  natural  or  artificial  _____ 
systems  can  solve  problems  like  sensing  motion  or  depth  or  manipulating  the  environment, 
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we  must  understand  the  nature  of  the  problem  —  for  example,  whether  it  can  be  solved  at 
all  and  what  constraints  the  physical  world  imposes  on  the  solution  —  before  we  can  fullv 
understand  the  detailed  procedures  used  to  find  a  solution. 

A  computational  approach  to  the  study  of  biological  systems,  based  on  the  founding 
principles  of  the  field  of  Artificial  Intelligence,  was  elucidated  by  Marr  and  Poggio  (1977; 
Marr,  1982).  Marr  was  attracted  to  the  field  of  Artificial  Intelligence  after  experiencing 
certain  limitations  of  other  theoretical  approaches  to  brain  research  in  his  early  work  on 
the  cerebellum  (Marr,  1969).  Although  his  model  for  learning  in  the  cerebellum  has  led  to 
important  experimental  work  (for  example,  Ito,  1984),  Marr  abandoned  this  line  of  research 
after  realizing  that  it  did  not  shed  light  on  how  complex  motor  behavior  can  actually  be 
achieved. 

In  his  later  work  in  computational  vision,  Marr  elucidated  three  distinct  levels  of  anal¬ 
ysis  that  are  necessary  for  understanding  an  information  processing  task: 

•  A  computational  theory  analyzes  what  problem  is  being  solved  and  why,  and  investigates 
the  natural  constraints  that  the  physical  world  imposes  on  the  solution  to  the  problem. 

•  An  algorithm  is  a  detailed  step-by-step  procedure  that  represents  one  method  for 
yielding  the  solution  indicated  by  the  theory. 

•  An  implementation  is  a  physical  realization  of  the  algorithm  by  some  mechanism  or 
hardware. 

These  levels  could  suggest  a  prescription  for  conducting  research  on  complex  problems;  that 
is,  one  first  formulates  a  theory,  then  derives  an  algorithm,  and  lastly  designs  a  mechanism 
that  implements  the  algorithm: 

theory  =>  algorithm  =>  mechanism. 

Despite  the  initial  success  of  this  approach,  research  over  the  past  few  years  has  shown 
that  computational  theories,  even  if  complemented  by  psychophysical  experiments  revealing 
how  humans  perform  visual  tasks,  have  inherent  limitations  in  understanding  the  brain.  In 
particular,  the  nature  of  the  hardware  can  profoundly  influence  the  type  of  algorithm  needed 
to  solve  a  particular  problem.  Thus,  while  the  computational  theory  and  properties  of  the 
hardware  can  often  be  studied  independently,  the  algorithmic  level  is  influenced  by  both. 
A  given  computation,  such  as  the  computation  of  stereo  depth  or  motion,  usually  can  be 
performed  by  several  different  algorithms.  These  algorithms  depend  not  only  on  the  nature 
of  the  computation  itself,  but  also  on  the  properties  and  limitations  of  the  hardware  in 
which  the  algorithm  is  implemented.  Thus,  in  order  to  explain  the  functions  of  a  visual 
system  at  its  different  levels,  not  only  must  the  abstract,  computational  nature  of  a  task  be 
understood,  but  also  the  properties  of  the  underlying  hardware.  The  flow  of  information  is 
therefore  in  both  directions: 
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theory  =>  algorithm  4=  mechanism. 

These  observations  stress  the  importance  of  integrating  the  results  of  computational  studies 
with  those  of  experimental  studies  of  biological  vision  systems. 

Other  introductions  to  the  computational  approach  described  here  can  be  found,  for 
example,  in  Poggio  (1984),  Morgan  (1985),  Ullman  (1986),  and  Hildreth  and  Hollerbach 
(1985).  The  latter  review  also  addresses  the  limitations  and  successes  of  the  computational 
approach  in  the  area  of  motor  control. 

Other  “Computational”  Approaches  to  the  Study  of  Biological  Systems 

The  term  computational  is  often  used  within  the  neurceciences  to  denote  very  different 
concepts.  For  example,  certain  neural  modeling  approaches  that  study  how  neuronal  net¬ 
works  can  operate  and  how  these  operations  can  be  extrapolated  to  explain  higher  brain 
functions  frequently  are  termed  “computational.”  Examples  of  this  include  the  seminal 
work  by  McCulloch  and  Pitts  (1943)  on  neuronal  networks,  the  work  on  perceptrons  (Min¬ 
sky  and  Papert,  1969)  and  parallel  “connectionist”  networks  (Ballard,  1986),  as  well  as 
Marr’s  original  work  on  the  cerebellum.  The  word  “computational”  in  this  case  refers  to 
the  detailed  working  of  specialized  hardware,  such  as  linear  threshold  automata,  rather  than 
to  an  analysis  of  information  processing  at  a  level  independent  of  the  underlying  hardware. 
Similarly,  connectionist  theories  refer  directly  to  neuronal  hardware  and  therefore  lack  the 
characteristics  of  Marr’s  notion  of  a  computational  theory  (Koch,  1986).  Although  they 
have  made  important  contributions  to  automata  theory  and  theoretical  cybernetics,  we 
want  to  emphasize  a  distinction  between  these  approaches  and  that  described  by  Marr  and 
Poggio  (1977;  Marr,  1982).  It  is  of  course  essential  to  understand  the  properties  of  the  bio¬ 
logical  hardware  —  neurons,  dendrites,  synapses,  channels,  etc.  —  in  order  to  understand 
what  algorithms  the  brain  uses  to  analyze  its  environment,  and  a  substantial  fraction  of 
this  article  is  devoted  to  aspects  of  neuronal  hardware.  We  believe,  however,  that  to  fully 
understand  a  complex  information  processing  system,  it  is  necessary  first  to  understand  the 
nature  of  the  tasks  the  system  is  required  to  perform. 

Finally,  computational  is  used  in  yet  another  sense,  as  in  computational  chemistry  or 
computational  biophysics.  This  term  generally  refers  to  the  extensive  use  of  computers  to 
simulate  a  given  chemical  or  biophysical  system,  such  as  the  reconstruction  of  the  tertiary 
structure  of  simple  proteins  by  using  the  principles  of  quantum  physics  and  chemistry 
(Clementi,  1985)  or  the  simulation  of  the  electrical  properties  of  an  array  of  pyramidal  cells 
in  the  hippocampus  (Traub  et  al.,  1984).  In  the  following  pages  we  refer  frequently  to  such 
simulations  of  biophysical  circuits. 
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Overview  of  Visual  Motion  Analysis 


The  pattern  of  movement  in  a  changing  image  is  not  given  to  the  visual  system  directly, 
but  must  be  inferred  from  the  changing  intensities  that  reach  the  eye.  The  3-D  shape 
of  object  surfaces,  the  locations  of  object  boundaries,  and  the  movement  of  the  observer 
relative  to  the  scene  can  in  turn  be  inferred  from  the  pattern  of  image  motion.  Typically,  the 
overall  analysis  of  motion  is  divided  into  two  stages:  first,  the  measurement  of  movement  in 
the  changing  two-dimensional  (2-D)  image,  and  second,  the  use  of  motion  measurements, 
for  example  to  recover  the  3-D  layout  of  the  environment.  It  is  not  clear  whether  motion 
analysis  in  biological  systems  is  necessarily  performed  in  two  distinct  stages,  but  this  division 
has  served  to  facilitate  theoretical  studies  of  motion  analysis  and  to  focus  empirical  questions 
for  perceptual  and  physiological  studies. 

The  measurement  of  movement  can  itself  be  divided  into  multiple  stages  and  may  be 
performed  in  different  ways  in  biological  systems.  In  the  human  visual  system  alone,  motion 
may  be  measured  by  at  least  two  processes,  termed  short-range  and  long-range  processes 
(for  example,  Braddick,  1974,  1980).  The  short-range  process  analyzes  continuous  motion, 
or  motion  presented  discretely  but  with  small  spatial  and  temporal  displacements  from  one 
moment  to  the  next.  The  long-range  process  may  then  analyze  motion  over  larger  spatial 
and  temporal  displacements,  as  in  apparent  motion.  Evidence  indicates  that  these  two 
processes  interact  at  some  stage  (Clatworthy  and  Frisby,  1973;  Green  and  von  Griinau. 
1983),  but  initially  they  may  be  somewhat  independent. 

The  subsequent  uses  of  motion  measurements  impcee  different  requirements  on  the 
precision  and  completeness  with  which  image  motion  must  be  represented.  The  localization 
of  object  boundaries  requires  the  detection  of  sharp  changes  in  direction  or  speed  of  move¬ 
ment,  but  may  not  need  a  precise  representation  of  absolute  velocities  everywhere.  Object 
tracking  requires  knowledge  of  the  gross  translation  of  an  object,  but  not  information  about 
the  detailed  relative  movements  that  take  place  within  the  object.  The  recovery  of  the  ac¬ 
curate  3-D  shape  of  a  moving  object,  on  the  other  hand,  appears  to  require  a  more  precise 
and  complete  estimate  of  the  local  variations  of  motion  across  object  surfaces.  Motion  anal¬ 
ysis  in  the  human  visual  system  may  ultimately  involve  the  interaction  of  many  processes, 
some  fast  but  rough,  others  slow  but  more  accurate,  and  still  others  that  are  specialized 
for  specific  tasks  such  as  detecting  object  boundaries  or  looming  motion.  These  processes 
must  work  together  in  a  way  that  provides  a  versatile  and  robust  motion  analysis  system. 

In  this  review,  we  first  address  the  earliest  stage  of  motion  measurement.  We  discuss 
two  important  theoretical  models  of  motion  detection,  correlation  and  gradient  models,  and 
present  relevant  psychophysical  and  physiological  data  regarding  biological  motion  detec¬ 
tors.  We  then  discuss  at  length  possible  biophysical  mechanisms  that  implement  the  com¬ 
putations  underlying  motion  discrimination  in  retinal  and  cortical  neurons.  Later  stages  of 
motion  measurement  are  then  discussed  in  a  subsequent  section,  which  addresses  the  com¬ 
putation  of  an  instantaneous  2-D  velocity  field,  long  range  motion  correspondence,  and 
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the  detection  of  motion  discontinuities.  Finally,  we  discuss  the  recovery  of  3-D  structure 
from  relative  motion.  This  article  is  not  intended  els  an  exhaustive  overview  of  work  on 
motion  analysis.  Rather,  we  highlight  some  of  the  areas  that  exhibit  fruitful  interactions 
between  computational  and  experimental  studies.  Two  recent  reviews  of  motion  analysis 
include  the  surveys  by  Barron  (1984),  focusing  on  computational  methods  for  deriving  and 
interpreting  optical  flow,  and  by  Nakayama  (1985),  focusing  primarily  on  the  psychophysics 
and  physiology  of  motion. 
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EARLY  MOTION  DETECTION  AND  MEASUREMENT 

Detecting  Motion:  Theory 

Before  motion  can  be  used  to  reconstruct  the  3-D  structure  of  objects,  the  visual  sys¬ 
tem  must  first  reliably  detect  and  measure  relative  motion  in  the  2-D  image.  What  types 
of  schemes  have  been  proposed  for  this  initial  detection,  how  are  these  schemes  related 
and  what  are  their  computational  properties?  The  most  general  property  of  any  motion 
discrimination  system  is  that  the  underlying  operation  must  be  nonlinear.  As  first  noted  by 
Poggio  and  Reichardt  (1973),  no  linear  operation  can  extract  the  direction  of  motion  of  a 
moving  stimulus.  The  schemes  proposed  for  motion  detection  fall  broadly  into  two  classes: 
(1)  correlation-like  schemes  (Hassenstein  and  Reichardt,  1956;  Poggio  and  Reichardt,  1973; 
van  Santen  and  Sperling,  1984)  and  (2)  gradient  schemes  (Fennema  and  Thompson,  1979; 
Horn  and  Schunck,  1981;  Marr  and  Ullman,  1981).  As  we  shall  see,  most  biological  motion 
detection  schemes  cannot  reliably  measure  velocity  even  for  one-dimensional  motions,  be¬ 
cause  their  output  typically  depends  on  contrast  and  on  a  mixture  of  velocity  and  spatial 
structure  of  the  moving  pattern  (Reichardt,  Poggio  and  Hausen,  1983). 

CORRELATION  MODELS  The  best  known  motion  detection  scheme  is  based  on  research 
done  over  the  last  thirty  years  on  movement  perception  in  insects.  On  the  basis  of  open 
and  closed-loop  experiments  performed  first  on  the  beetle,  Chlorophanus,  and  later  on 
the  fruitfly,  Drosophila,  and  the  housefly,  Musca  Domestica,  a  number  of  researchers,  most 
notably  W.  Reichardt,  were  led  to  the  following  conclusions  regarding  motion  discrimination 
in  insects  (Hassenstein  and  Reichardt,  1956;  Varju  and  Reichardt,  1967;  Gotz,  1968,  1972; 
Reichardt,  1969;  Poggio  and  Reichardt,  1976;  Reichardt  and  Cuo,  1986): 

i)  A  sequence  of  two  light  stimuli  impinging  on  adjacent  receptors  is  the  elementary 
event  that  evokes  an  optomotor  response. 

ii)  The  relation  between  the  stimulus  input  to  these  two  receptors  and  the  optomo¬ 
tor  output  follows  the  rule  of  algebraic  sign  multiplication.  For  instance,  stimulating 
receptor  1  with  alternating  dark  to  light  changes  and  receptor  2  with  light  to  dark  tran¬ 
sitions  leads  to  a  turning  response  of  the  insect  opposite  to  the  direction  of  stimulus 
successions,  while  dark  to  light  transitions  presented  to  both  receptors  elicits  a  turning 
reponse  in  the  direction  of  the  stimulus  succession. 

iii)  The  strength  of  the  optomotor  response  is  proportional  to  the  product  of  the  two 
stimuli. 

On  the  basis  of  these  experimental  conclusions,  a  minimum  mathematical  model  of 
motion  perception  in  insects  was  formulated.  Figure  la  shows  a  modified  version  of  this 
correlation  model.  The  image  is  sampled  by  a  receptor  with  a  point-like  receptive  field. 
The  input  to  the  receptor  can  thus  be  described  by  I(t).  The  output  of  the  receptor  is 
subsequently  passed  through  a  linear  high -pass  filter,  removing  steady  state  components  of 
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the  output  of  the  receptor,  before  being  multiplied  with  a  low-  or  band-pass  filtered  signal 
from  a  neighboring  receptor.  Thus,  at  this  stage  the  signal  strength  is  given  by: 


/-t-oo  r  +  oo 

/  W(t\tt2)I(t  —  ti)I(t  -  t2)dtldt2 

•oo  J  —  oo 


where  lF(<i.t2)  represents  the  lumped  transfer-function  for  the  different  filters.  Subse¬ 
quently,  the  output  of  the  multiplication  operation  is  integrated  over  time.  A  little  analysis 
will  show  that  the  output  of  this  stage  is  equivalent  to  the  autocorrelation  of  the  input 
function  I(t).  Let  us  assume  that  the  low-pass  filter  actually  corresponds  to  a  fixed  delay 
St  >  0.  We  are  then  essentially  multiplying  a  linearly  transformed  version  of  I(t)  with  itself, 
but  shifted  by  the  total  amount  At  =  St  +  Ai/ti  (where  Ax  >  0  is  the  spacing  between  the 
receptors  and  v  the  velocity  of  the  stimulus),  and  integrating  the  resulting  function  over 
time.  For  a  range  of  negative  velocities,  i.e.  movement  from  the  right  to  the  left,  At  will 
be  very  small  and  the  final  output  of  this  subunit  will  be  large.  For  positive  velocities, 
that  is  for  movements  in  the  opposite  direction,  the  two  functions  7(f)  and  7(t  +  At)  are 
out  of  synchrony  and  their  product,  integrated  over  time,  will  be  small.  The  output  of  this 
subunit  is  then  subtracted  from  the  output  of  the  complementary  subunit  to  yield  the  total 
detector  response.  It  follows  that  if  the  output  of  the  right  subunit  exceeds  the  output  of 
the  left  subunit,  the  detector  response  is  positive,  indicating  rightward  motion;  likewise,  if 
the  output  of  the  left  subunit  exceeds  the  output  of  the  right  subunit,  detector  response 
is  negative,  indicating  leftward  motion.  This  theoretical  model  has  a  number  of  properties 
that  can  be  tested  experimentally.  Two  of  the  most  interesting  are  phase  invariance  and 
spatial  aliasing  (for  an  overview  see  Reichardt,  1969). 

Imagine  a  light  pattern  consisting  of  a  number  of  superimposed  sinusoidal  gratings  of 
different  spatial  frequencies.  Because  the  process  of  autocorrelation,  i.e.  multiplication  and 
subsequent  integration,  destroys  all  of  the  information  that  is  inherent  to  the  specification  of 
the  phases  of  the  gratings,  the  output  of  the  motion  detectors  is  invariant  to  any  changes  in 
the  phase  relations  of  the  sinusoidal  gratings.  Because  any  pattern  I(t)  can  be  decomposed 
into  its  Fourier  components,  it  follows  that  this  class  of  motion  detectors  does  not  sense 
the  relative  position  of  the  Fourier  components.  This  important  result  has  been  tested  and 
confirmed  in  experiments  with  the  beetle,  Chlorophanus,  the  fruitfly,  Drosophila ,  and  with 
Musca  by  evaluation  of  the  time-averaged  optomotor  reactions  to  the  angular  motion  of  a 
Fixed  pattern  painted  on  the  inside  of  a  drum.  Moreover,  the  total  time-averaged  response  is 
simply  given  by  the  sum  of  the  time  -averaged  response  to  the  individual  Fourier  components 
(Poggio  and  Reichardt,  1973).  Figure  2a  shows  the  angular  distribution  of  the  brightness  of 
two  distinct  patterns,  obtained  by  superposition  of  the  different  Fourier  components.  These 
patterns  only  differ  with  respect  to  their  phase  relations.  Yet  the  fruitfly  reacts  equally  to 
motions  of  the  two  patterns  (Gotz,  1972). 

For  any  particular  sinewave  grating,  the  temporal  phase  difference  between  the  two 
inputs  to  the  multiplication  will  depend  on  the  distance  between  its  input  channels,  Ax, 
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Figure  1.  (a)  A  direction  selective  subunit  of  the  correlation  model  of  Hasaenstein  and  Reichardt 
(1956)  as  modified  by  Kirschfeld  (1972).  The  two  inputs  are  multiplied  after  low  pass  filtering  with 
different  time  constants.  If  an  average  operation  is  made  on  the  output,  the  overall  operation  is 
equivalent  to  cross-correlation  of  the  two  inputs.  Subsequently,  the  time-averaged  response  of  this 
subunit  is  subtracted  from  the  response  of  a  similar  but  mirror-symmetric  subunit  to  yield  the  final 
movement  sensitive  response,  (b)  The  functional  scheme  proposed  by  Barlow  and  Levick  (1965) 
to  account  for  direction  selectivity  in  the  rabbit  retina.  A  pure  delay  At  is  not  necessary:  a  low 
pass  filtering  operation  is  sufficient,  (c)  The  equivalent  electrical  circuit  of  the  synaptic  interaction 
assumed  to  underlie  direction  selectivity  as  proposed  by  Torre  and  Poggio  (1978).  The  interaction 
implemented  by  the  circuit  is  of  the  type  g\  -  agin,  where  pi  and  gi  represent  the  excitatory  and 
inhibitory  synaptic  inputs.  From  Torre  and  Poggio  (1978). 


and  on  the  spatial  wavelength  A  of  the  sinewave  grating  used.  The  original  correlation 
model  displays  spatial  aliasing:  if  one  changes  the  spatial  period  of  the  grating,  but  not 
its  direction  of  motion,  the  sign  of  the  detector  response  reverses,  indicating  an  incorrect 
motion.  Within  the  wavelength  region  A  >  2Ax,  the  moving  sinusoidal  pattern  is  resolved  by 
the  receptor  system  as  the  number  of  samples  received  per  period  A  at  any  time  is  greater 
than  or  equal  to  two.  If,  however,  A  <  2Ax,  optimal  resolution  of  the  periodic  pattern 
breaks  down,  because  less  than  two  samples  per  wave  length  of  the  pattern  are  observed 
(see  also  Shannon’s  sampling  theorem)  and  the  detector  signals  the  incorrect  direction  for 
Ax  <  A  <  2Ax  (Figure  2b).  This  inversion  of  apparent  motion  does  occur  in  various  insects 
and  has  been  used  to  determine  the  grating  constant  of  the  receptor  spacing  (Reichardt, 
1969). 


I 

ly', 


,a»  v . 


Computations  Underlying  Motion 


Hildreth  &'  Koch 


360* 


360" 


l-X-l 


q>J?  o  o  €  •  •  •  3 


OGCtMSOO 


€•••3000® 


Figure  2.  Two  experimental  predictions  of  the  correlation  model,  (a)  Phase  invariance:  The  left 
part  of  the  figure  shows  two  different  light  patterns  received  by  a  photoreceptor  at  different  angular 
positions  of  the  environment.  Both  distributions  contain  the  same  set  of  Fourier  components  shown 
in  the  right  part  of  the  figure,  but  with  different  phases.  However,  insects  like  the  housefly,  the 
fruitfly  or  the  beetle  respond  with  the  same  optomotor  reaction  to  both  patterns.  At  the  moment, 
it  is  not  known  whether  direction  selective  cells  in  the  mammalian  visual  system  show  phase 
invariance,  (b)  Inverse  motion  perception:  interference  phenomena  in  the  insect  eye  elicited  by  a 
moving  pattern  with  a  comparatively  small  spatial  wavelength.  When  the  distance  between 
input  channels  in  the  insect’s  eye  is  between  one-half  and  one  spatial  period  A  of  the  pattern  of 
excitation,  the  correlation  model  signals  the  incorrect  direction  of  motion.  The  insect  ;s  compelled 
to  follow  this  apparent  motion  in  the  direction  opposite  to  the  “true”  direction  of  motion  Redrawn 
from  (iotz  ( 1972)  9 
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This  property  of  the  original  correlation  model  can  be  avoided  by  replacing  the  point 
shaped  receptive  field  of  the  receptor  in  the  original  Reichardt  model  with  a  spatial 
dependent  receptive  field  of  finite  extent  (Fermi  and  Reichardt,  1963;  Gotz,  1965;  Reichardt. 
Poggio  and  Hausen,  1983;  van  Santen  and  Sperling,  1984,  1985).  Van  Santen  and  Sperling 
show  how  to  choose  the  receptive  field  in  their  elaborated  Reichardt  detector  so  that  the  sign 
of  the  detector  output  is  correct  for  any  drifting  sinewave  grating,  van  Santen  and  Sper 
ling  (1985)  showed  that  the  elaborated  Reichardt  model  is  fully  equivalent  to  two  recently 
proposed  models  of  human  motion  detection:  an  elaborated  version  of  the  motion  detector 
of  Watson  and  Ahumada  (1985)  and  the  “spatiotemporal  energy”  motion  detector  of  Adel- 
son  and  Bergen  (1985).  These  and  similar  models  characterized  by  a  multiplication  like 
nonlinearity  are  all  equivalent  to  the  correlation  model  (Poggio  and  Reichardt,  1973). 

GRADIENT  MODELS  Gradient  schemes  rely  on  the  relationship  between  the  spatial  and 
temporal  gradients  of  image  intensity.  In  the  case  of  the  one -dimensional  movement  of  an 
intensity  profile  I(x,t)  over  a  small  displacement  dx  in  time  dt,  the  temporal  derivative  of 
image  intensity  It  «  ( I(x,t  -f  dt)  —  I(x,t))/dt  and  the  spatial  derivative  of  the  intensity 
Ix  ss  (I(x  +  dx,t)  —  I(x,t))/dx  are  related  by 

dx  It 

V  =  — r-  =  — — 

dt  Ix 

where  v  is  the  velocity  of  the  pattern.  This  method  was  originally  proposed  by  Limb 
and  Murphy  (1975)  and  later  extended  by  Fennema  and  Thompson  (1979).  The  approach 
carries  over  to  the  2-D  case  (Horn  and  Schunck,  1981).  Here,  however,  due  to  a  fundamental 
limitation  in  the  measurement  process,  termed  the  aperture  problem  (discussed  later),  only 
the  component  of  the  velocity  in  the  direction  of  the  brightness  gradient  can  be  measured. 
If  we  assume  that  the  motion  measurement  process  occurs  along  an  edge,  only  the  velocity 
component  at  right  angles  to  the  edge  can  be  recovered.  It  is  given  by 


where  Ix,  ly  are  the  spatial  derivatives  in  the  x  and  y  directions.  This  equation  is  strictly 
only  correct  for  rigid,  translating  patterns,  with  no  rotation,  seen  under  orthographic  pro¬ 
jection  (Schunck,  1984).  For  sufficiently  small  temporal  and  spatial  displacements  dx.dy. 
and  dt,  however,  the  equation  approximates  the  correct  one.  Gradient  schemes  suffer  from 
the  disadvantage  that  they  require  computation  of  the  derivatives  of  the  intensity  values, 
an  operation  that  is  extremely  sensitive  to  noise. 

A  quantized  version  of  the  gradient  scheme  was  proposed  by  Marr  and  Ullinan  (1981). 
This  model  operates  on  locations  in  the  images  where  the  light  intensity  changes  signifi¬ 
cantly.  Marr  and  Hildreth’s  analysis  (1980)  showed  that  zero  crossings,  that  is  locations 
where  the  Laplacian  of  the  image  is  zero,  correspond  closely  to  intensity  edges  in  the  original 
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image.  Marr  and  Ullman  track  the  motion  of  zero-crossings  in  the  following  way.  An  edge 
detector  S  of  the  Marr  and  Hildreth  type  signals  the  absence  or  presence  of  a  zero-crossing 
at  location  x.  This  detector  has  two  variants,  one  for  transitions  from  dark  to  light  (termed 
a  light-on  edge)  and  one  for  light  to  dark  transitions  (light-off  edge).  A  second  type  of  de¬ 
tector,  termed  a  T  unit,  samples  the  temporal  derivative  of  the  intensity  in  approximately 
the  same  patch  of  the  visual  field  as  the  edge-detecting  unit.  One  version  of  this  unit, 
T+ ,  only  signals  when  the  temporal  derivative  is  positive,  that  is  when  a  light-on  edge  has 
moved  to  the  left  or  a  light-off  edge  moves  to  the  right,  whereas  T~  only  responds  to  a  re¬ 
duction  in  light  intensity.  Combining  the  output  of  an  S  and  a  T  unit  conjunctively  yields  a 
set  of  detectors  signaling  the  left  (or  rightward)  motion  of  light-on  or  light-off  edges.  Marr 
and  Ullman  tentatively  identify  the  edge  detecting  S  units  with  sustained  X-like  On-  or 
Off-center  cells  and  the  T+  and  T~  units  with  transient  Y-like  On-  or  Off-center  cells. 
Computer  experiments  on  some  images  have  shown  that  this  gradient  scheme  can  recover 
motion  information  from  image  sequences.  Note  that  their  model,  different  from  other  gra¬ 
dient  schemes,  does  not  provide  an  estimate  of  the  local  velocity,  but  only  its  sign,  that  is 
the  direction  of  motion,  although  some  measure  of  velocity  could  be  extracted.1 

MOTION  PRIMITIVES  What  are  the  primitives  used  to  detect  and  measure  motion,  and 
at  what  stage  in  the  analysis  of  the  image  does  the  detection  of  motion  take  place?  For 
instance,  are  the  initial  measurements  of  the  light  intensity  in  the  photoreceptors  taken  as 
primitives,  or  are  the  measurements  extracted  after  the  filtering  and  smoothing  of  the  visual 
input  at  the  stage  of  the  retinal  ganglion  cells  or  even  cortical  cells?  Finally,  more  symbolic 
primitives  such  as  zero-crossings,  edges,  and  line  segments  or  even  endpoints,  corners, 
breaks,  local  deformities  of  objects,  or  discontinuities  in  line  orientation  could  also  be  used. 
The  advantage  of  matching  more  symbolic  tokens,  such  as  zero-crossings,  across  the  image 
is  that  these  tokens  mark  interesting  points  in  an  image,  for  instance  locations  where  the 
image  intensity  changes  most.  Tokens  are  generally  far  more  stable  to  changes  and  noise 
in  the  illumination  than  the  original  intensities  or  some  filtered  version  of  them.  Moreover, 
because  tokens  presumably  are  sparsely  distributed  in  the  image,  far  fewer  points  must  be 
matched  and  ambiguities  can  be  avoided.  If,  however,  large  areas  of  the  image  contain 
no  tokens,  for  instance  if  the  light  intensity  changes  little,  these  areas  will  not  have  any 
motion  measurements  assigned  initially  (these  areas  could  be  filled  in  later  on).  A  further 
disadvantage  of  symbolic  primitives  is  that  they  must  be  unambigously  identified  before 
they  can  be  matched,  thus  preventing  an  early  computation  of  motion. 

For  the  visual  system  of  the  fly,  the  experimental  evidence  suggests  that  the  primitive 
is  simply  some  measure  of  local  intensity  flux  (Reichardt  et  al.,  1983).  For  the  short-range 
motion  system,  Hildreth  (1984)  discusses  the  evidence  that  motion  measurement  may  rely 
on  the  detection  of  the  movement  of  features  such  as  zero-crossings,  or  some  similar  measure 

1  It  can  be  shown  formally  that  for  small  contrast  amplitudes,  the  correlation-model  and  the 
gradient  scheme  are  equivalent  (T.  Poggio,  personal  communication). 
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operating  on  the  smoothed  intensity  values,  and  that  the  limits  on  spatial  and  temporal 
displacements  observed  empirically  in  the  short-range  motion  system  are  the  consequence 
of  the  limited  spatial  and  temporal  extent  of  the  initial  filtering  (see  also  Marr  and  Ullinan, 
1981).  Much  more  work  needs  be  done,  however,  before  the  question  of  the  primitives  used 
by  the  motion  system  can  be  answered. 

Detecting  Motion:  Psychophysics 

Both  gradient  and  correlation  schemes  are  local,  involving  only  limited  parts  of  the 
visual  scene,  and  are  therefore  likely  to  provide  a  dominant  input  to  the  short-  range  pro¬ 
cess,  which  appears  to  operate  on  motion  restricted  to  a  spatial  range  of  up  to  10'  -  15' 
minutes  of  visual  arc  and  an  interstimulus  interval  less  than  80-  100  msec.  (Braddick,  1974, 
1980).  Because  these  separations  in  space  and  time  are  small,  establishing  correspondence 
between  items  in  consecutive  images  is  considerably  easier  than  in  the  long  range  process 
(see  next  section).  Finally,  the  short-range  process  is  assumed  to  operate  directly  on  the 
light  intensities,  filtered  intensities  or  on  edges  or  zero-cro6sings.  Interestingly,  color  seems 
to  provide  little  if  any  input  to  the  short-range  process  (Ramachandran  and  Gregory,  1978). 
In  the  following,  we  discuss  the  (limited)  human  perception  evidence  that  has  been  used  to 
discriminate  between  the  various  models  of  motion  computation  discussed  above. 

One  of  the  main  properties  of  the  Reichardt  correlation  model  is  that  its  output  re 
sponds  not  only  to  pattern  velocity  but  also  to  structural  properties  of  the  pattern  contrast 
This  property  allows  the  motion  detector  to  be  used  as  pattern  discriminator,  at  least  in 
flies  (Reichardt  et  ai,  1983;  Reichardt  and  Guo,  1986).  Specifically,  it  can  be  shown  (for 
instance,  in  Poggio  and  Reichardt,  1973,  1976)  that  the  time-averaged  response  of  the 
correlation  subunit  depends  on  the  ratio,  for  each  spatial  Fourier  component,  of  the  pat¬ 
tern  velocity  v  and  the  spatial  wavelength  A  of  the  stimulus  used.  Thus  wavelength  and 
velocity  trade  off  against  each  other  and,  as  a  consequence,  the  correlation  model  cannot 
reliably  measure  the  speed  of  movement.  This  property,  first  confirmed  with  behavioral  ex¬ 
periments  for  the  fly,  Musca  Domestica  (Eckert,  1973),  also  seems  to  extend  to  the  human 
visual  system.  If  subjects  fixate  a  point  while  square  or  sinusoidal  gratings  of  variable  spatial 
wavelength  are  moved  past  the  fixation  point  at  various  speeds,  their  perception  of  velocity 
depends  linearly  on  both  the  speed  and  spatial  frequency  of  the  gratings  (I)iener  et  al.. 
1976;  Burr  and  Ross,  1982).  These  experiments  seem  consistent  with  a  multiplicative  like 
second -order  correlation  model. 

A  striking  prediction  of  the  original  Reichardt  model  is  motion  inversion:  if  the  wave¬ 
length  of  thestimulus  pattern  is  less  than  twice  the  separation  between  input  channels,  the 
insect  will  perceive  motion  in  the  direction  opposite  to  the  true  direction  of  motion  (Re¬ 
ichardt,  1969;  Gotz,  1972).  Because  humans,  in  contrast  to  insects,  generally  do  not  seem 
to  show  spatial  aliasing,  the  point-like  receptive  field  assumption  of  the  original  correlation 
model  must  be  abandoned  in  favor  of  extended  receptive  fields  (Fermi  and  Reichardt.  1963). 
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It  can  then  be  shown  that  motion  reversal  can  be  prevented  (see,  for  instance,  van  Santen 
and  Sperling,  1984).  Van  Santen  and  Sperling  (1984,  1985)  test  this  “elaborated  Reichardt” 
model  with  a  number  of  psychophysical  experiments.  In  particular,  by  varying  the  contrast 
of  neighboring  vertically  oriented  bars  moving  in  a  horizontal  direction,  they  show  that  the 
total  response  of  the  subject  depends  on  the  product  of  the  amplitudes  of  the  two  bars,  a 
finding  that  offers  support  for  the  multiplication  principle. 

Psychophysical  evidence  in  favor  of  the  gradient  scheme  is  presented  by  Moulden  and 
Hogg  (1984).  In  one  particularly  ingenious  experiment,  they  show  polarity  and  direction- 
specific  effects  on  motion  discrimination  in  response  to  adaptation  to  a  non-moving,  spa¬ 
tially  homogenous  stimulus,  and  provide  evidence  for  channels  tuned  to  detect  an  increase 
or  decrease  in  the  light  intensity  (Marr’s  and  Ullman’s  (1981)  T'+  and  T~  units). 

Thus,  the  current  psychophysical  evidence  does  not  decisively  favor  one  particular 
theory. 

Detecting  Motion:  Circuitry  and  Biophysics 

Having  described  some  of  the  algorithms  proposed  to  underlie  motion  detection,  we  now 
discuss  in  more  detail  the  biophysical  mechanisms  that  may  be  used  for  motion  detection. 
Numerous  nerve  cells  in  the  visual  system  of  both  invertebrates  and  vertebrates  respond 
differentially  to  motion.  Moving  a  visual  stimulus,  say  a  dark  bar  on  a  light  background, 
in  the  preferred  direction  elicits  a  vigorous  response  from  the  cell,  whereas  movement  in 
the  opposite  direction,  termed  the  null  direction,  yields  no  significant  response.  Direction 
selective  cells,  first  described  in  the  frog’s  retina  in  a  classical  paper  by  Maturana  et  al. 
(1960),  have  subsequently  been  identified  in  the  third  optic  ganglion  of  the  house  fly  (for  a 
review  of  the  extensive  literature  see  Hausen,  1982a, b),  in  the  retina  of  pigeons  (Maturana 
and  Frenk,  1963;  Holden,  1977),  rabbits  (Barlow,  Hill  and  Levick,  1964;  Barlow  and  Levick, 
1965),  ground  squirrels  (Michael,  1966),  and  cats  (Stone  and  Fabian,  1966;  Cleland  and 
Levick,  1974),  and  in  the  visual  cortex  of  both  cats  and  monkeys  (Hubei  and  Wiesel,  1959, 
1962;  Schiller,  Finlay  and  Volman,  1976;  Orban,  Kennedy  and  Maes,  1981).  Analyzing 
these  cells  afford  us  the  opportunity  to  study  the  elementary  biophysical  events  underlying 
a  well  characterized  but  nonlinear  (that  is,  nontrivial)  operation  in  single  nerve  cells. 

In  most  mammals,  except  cats  and  primates,  the  first  cells  that  seem  to  discriminate  the 
direction  of  motion  are  the  retinal  ganglion  cells.  Thus,  in  the  rabbit’s  retina  approximately 
one  quarter  of  the  ganglion  cells  can  be  described  as  direction  selective.  In  the  cat  retina, 
however,  less  than  1%  of  the  physiological  identified  ganglion  cells  are  direction  selective 
(Rodieck.  1979)  while  no  such  cells  have  been  reported  in  the  monkey’s  retina.2  Because 
neither  cells  in  the  A  and  Al  layers  of  the  lateral  geniculate  nucleus  (LGN)  of  the  cat  nor 
cells  in  the  rnagno  and  parvo-cellular  layers  in  the  monkey  are  strongly  direction  selective, 

'Due  to  the  inevitable  electrode  bias,  this  does  not  necessarily  imply  such  cells  do  not  exist  in 
the  primate  retina. 
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the  appearance  of  substantial  numbers  of  direction  selective  neurons  in  the  primary  cortex 
of  both  animals  strongly  suggests  that  this  property  arises  first  in  the  cortex. 

COMPUTING  THE  DIRECTION  OF  MOTION  IN  THE  RETINA:  Early  Experiments 
Barlow  and  Levick  (1965)  systematically  explored  directional  selectivity  in  the  retina  of 
the  rabbit  by  using  extracellular  recordings.  About  20%  of  the  ganglion  cells  in  the  visual 
streak  give  both  On  and  Off  responses  to  stationary,  flashed  stimuli  and  are  direction  selec¬ 
tive  for  moving  stimuli.  These  cells  therefore  compute  the  direction  of  motion  independent 
of  the  contrast  of  the  stimulus  (i.e.  dark  stimulus  on  a  light  background  or  vice  versa). 
A  smaller  proportion  of  ganglion  cells  (as  7%)  are  direction  selective  and  of  the  On-type, 
that  is,  they  respond  only  to  light-on  edges.  These  cells  project  to  the  accessory  optic 
system  in  the  midbrain  and  are  believed  to  be  crucial  for  the  control  of  the  optokinetic  nys¬ 
tagmus  (Oyster,  Takahashi  and  Collewijn,  1972)  and  image  stabilization  (Simpson,  1984). 
Off-type  direction  selective  cells  have  neither  been  reported  in  the  rabbit  or  cat,  although 
they  are  found  in  the  turtle.  Two  important  conclusions  can  be  drawn  from  Barlow  and 
Levick’s  (1965)  report.  First,  inhibition  is  crucial  for  direction  selectivity.  On  the  basis 
of  this  evidence  Barlow  and  Levick  proposed  that  sequence  discrimination  is  based  upon 
a  scheme  whereby  the  response  to  the  null  direction  is  vetoed  by  appropriate  neighboring 
inputs  (the  AND  NOT  gate  in  Figure  lb).  Directionality  is  achieved  by  an  asymmetric 
delay  —  or  by  a  low  pass  filter  —  between  excitatory  and  inhibitory  channels  from  the 
photoreceptors  to  the  ganglion  cell.  This  model  can  be  considered  as  an  instance  of  the 
Reichardt  correlation  model.  Second,  this  veto  operation  must  occur  within  small  indepen¬ 
dent  subunits  distributed  throughout  the  receptive  field  of  the  cell,  because  movement  of  a 
bar  over  0.25°  to  0.5°  elicits  a  direction  selective  response  (whereas  the  whole  receptive  field 
subtends  4.5°;  Barlow  and  Levick,  1965).  Thus,  the  site  of  the  veto  operation  is  extensively 
replicated  throughout  the  receptive  field  of  the  direction  selective  cell.  Confirming  evidence 
for  the  critical  role  of  inhibition  comes  from  experiments  in  which  inhibition  is  blocked  with 
pharmacological  agents  (Caldwell,  Daw  and  Wyatt,  1978;  Ariel  and  Daw,  1982;  Ariel  and 
Adolph,  1985),  a  situation  that  results  in  an  equal  response  for  both  preferred  and  null 
directions  (see  below). 

A  Biophysical  Model  We  can  now  ask  how  this  operation  is  implemented  at  the  level  of 
the  hardware,  i.e.  at  the  level  of  retinal  cells.  Torre  and  Poggio  (1978)  proposed  a  specific 
biophysical  mechanism  implementing  the  neural  equivalent  of  a  veto  operation. 

When  two  neighboring  regions  of  a  dendritic  tree  experience  simultaneous  conductance 
changes,  induced  by  synaptic  inputs,  the  resulting  postsynaptic  potential  is  generally  not 
the  sum  of  the  potentials  generated  by  each  synapse  alone;  that  is,  synaptic  inputs  may 
interact  in  a  highly  nonlinear  fashion.  This  is  particularly  true  for  an  inhibitory  synap¬ 
tic  input  that  increases  the  membrane  conductance  with  an  associated  ionic  battery  that 
reverses  at,  or  very  near,  the  resting  potential  Ere»i  of  the  cell.  Activating  this  type  of 
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inhibition,  called  silent  or  shunting  inhibition,  is  similar  to  opening  a  hole  in  the  mem¬ 
brane:  its  effect  is  only  noticed  if  the  intracellular  potential  is  substantially  different  from 
Erest.  Torre  and  Poggio  (1978)  showed  in  a  lumped  electrical  model  of  the  membrane  of 
the  cell  that  silent  inhibition  can  cancel  effectively  the  excitatory  postsynaptic  potential 
(EPSP)  induced  by  an  excitatory  synapse  without  hyperpolarizing  the  membrane.  More¬ 
over,  for  small  synaptic  conductance  inputs  the  interaction  between  excitation  and  silent 
inhibition  is  multiplication-like,  thereby  approximating  the  nonlinear  operation  underly¬ 
ing  the  correlation-scheme  (see  legend  to  Figure  lc).  Pairs  of  excitatory  and  inhibitory 
synapses  distributed  throughout  the  dendritic  tree  may  compute  the  direction  of  motion 
at  many  independent  sites  throughout  the  receptive  field  of  the  cell,  in  agreement  with  the 
physiological  data.  Because  nonlinearity  of  the  interaction  is  an  essential  requirement  of  this 
scheme,  Torre  and  Poggio  suggest  that  the  optimal  location  for  excitation  and  inhibition 
are  fine  distal  dendrites  or  spines  of  the  direction  selective  ganglion  cell. 

Because  this  analysis  left  out  the  precise  conditions  required  to  produce  effective  and 
specific  nonlinear  interactions  in  a  dendritic  tree,  Koch,  Poggio  and  Torre  (1982,  1983)  used 
one-dimensional  cable  theory  to  analyze  the  interaction  between  time-varying  excitatory 
and  inhibitory  synaptic  inputs  in  a  morphologically  characterized  cat  retinal  ganglion  cell 
(of  the  S  type;  see  Boycott  and  Wassle,  1974).  They  were  able  to  prove  rigorously  in  the 
case  of  steady  state  synaptic  conductance  inputs,  that  in  a  passive  and  branched  dendritic 
tree  the  most  effective  location  for  silent  inhibition  (most  effective  in  terms  of  reducing  an 
EPSP)  must  always  be  on  the  direct  path  between  the  location  of  the  excitatory  synapse 
and  the  soma. 

Detailed  biophysical  simulations  of  highly  branched  and  passive  neurons  show  that  this 
on-the-path  condition  can  be  quite  specific.  If  the  amplitude  of  the  inhibitory  conductance 
change  is  above  a  critical  value,  inhibition  can  reduce  excitation  by  as  much  as  a  factor  of 
10,  as  long  as  inhibition  is  located  between  the  excitatory  synapse  and  the  soma.  Inhibition 
more  than  about  10pm  behind  excitation  or  on  a  neighboring  branch  10  or  20pm  off  the 
direct  path  is  ineffective  in  reducing  excitation  significantly.  This  specificity  in  terms  of 
spatial  positioning  of  excitatory  and  inhibitory  synapses  carries  over  into  the  temporal 
domain.  For  maximal  effect,  inhibition  must  last  at  least  as  long  as  excitation  and  the 
inhibitory  and  excitatory  conductance  changes  must  occur  nearly  synchronously  (Koch  el 
ai,  1983;  Segev  and  Parnas,  1983).  Finally,  the  on-the-path  condition  is  also  valid  in  the 
presence  of  action  potentials:  in  order  for  silent  inhibition  to  block  the  propagation  of  a  spike 
past  a  branching  point,  it  must  be  located  at  most  5pm  from  the  branch  point  (O'Donnell, 
Koch  and  Poggio,  3985).  Because  such  a  precise  mapping  imposes  stringent  conditions  on 
the  specificity  of  the  positioning  of  synapses  during  development  of  the  retinal  circuitry, 
one  simple  developmental  rule  to  follow  is  that  a  pair  of  excitatory  and  inhibitory  inputs 
originating  from  interacting  photoreceptors  should  contact  the  ganglion  cell  dendrite  close 
to  one  another. 

The  specificity  of  silent  inhibition  contrasts  with  the  action  of  a  hyperpolarizing  synap- 
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tic  input  (i.e.  a  conductance  change  with  an  associated  battery  below  Err&t)-  In  this  case, 
the  interaction  between  excitation  and  inhibition  will  be  much  more  linear,  that  is.  tlm 
inhibitory  synapse  will  reduce  the  EPSP  generated  by  the  excitatory  synapse  by  an  amount 
roughly  proportional  to  the  inhibitory  conductance  change  with  less  regard  to  the  relative 
spatial  positioning  of  excitatory  and  inhibitory  synapses  (Koch  et  ai,  1982;  O’Donnell. 
Koch  and  Poggio,  1985;  Koch  and  Poggio,  1986). 

Critical  Predictions  of  the  Model  How  does  the  model  fare  against  experimental  evidence? 
The  following  lists  some  of  the  most  important  predictions; 

•  On-Off  direction  selective  cells  receive  distinct  excitatory  and  inhibitory  synaptic  in¬ 
puts.  The  reversal  potential  of  the  inhibitory  input  is  close  to  the  resting  potential  of 
the  cell  (probably  acting  via  a  GABA,*  receptor). 

•  Bicucculin  should  abolish  direction  selectivity. 

•  Inhibitory  synapses  are  not  more  distal  to  the  soma  than  excitatory  synapses. 

•  Direction  selectivity  is  computed  at  many  independent  sites  in  the  dendritic  tree  before 
spike  initiation  at  the  axonal  hillock. 

•  The  direction  selective  cell  should  show  a  like  morphology,  with  a  highly  branched, 
bistratified  dendritic  tree  with  small  diameter  dendrites  or  possibly  spines. 

•  On-Off  direction  selective  cells  are  expected  to  show  little  interaction  between  a  dark 
bar/spot  and  light  bar/spot  moving  in  opposite  directions  within  the  receptive  field. 

Currently,  the  main  support  for  this  hypothesis  derives  from  intracellular  recordings  in 
retinal  ganglion  cells  from  the  turtle  (Marchiafava,  1979)  and  the  bullfrog  (Watanabe  and 
Murakami,  1984).  Moving  a  spot  or  bar  in  the  preferred  direction  gives  rise  to  a  somatic 
EPSP  with  superimposed  action  potentials  whereas  null  direction  stimulation  results  in  a 
smaller  EPSP  without  a  hyperpolarization.  The  reduced  somatic  EPSP  in  the  null  direction 
appears  to  be  caused  by  an  inhibitory  process  that  increases  the  membrane  conductance  with 
an  associated  reversal  potential  at  or  very  near  the  resting  potential  of  the  cell.  This  silent 
inhibition  is  revealed  by  injecting  a  steady-state  depolarizing  current  into  the  soma,  giving 
rise  to  a  hyperpolarization  (see  Figure  3).  Preliminary  evidence  from  rabbit  ganglion  cells 
indicates  the  presence  of  a  similar  inhibitory  input  (F.  Amthor,  personal  communication). 

Within  the  last  few  years,  two  groups  have  determined  the  morphological  structure 
of  On-Off  direction  selective  ganglion  cells.  Using  a  fluorescent  stain,  Jensen  and  DeVoe 
(1983)  visualized  these  cells  in  the  turtle  retina,  and  Amthor.  Oyster  and  Takahashi  ( 1981). 
used  horseradish  peroxidase  (HRP)  in  the  rabbit.  The  overall  morphology  of  these  cells 
is  similar  in  the  two  species.  Rabbit  direction  selective  ganglion  cells  have  several  distinct 
features  that  allow  visual  identification  on  purely  morphological  grounds  (Figure  la).  (1) 
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Figure  3.  (a)  The  effect  of  intracellular  current  injection  upon  the  photoresponse  in  an  intracellular 
recorded  direction  selective  turtle  ganglion  cell.  The  response  in  the  preferred  and  null  directions 
are  shown  in  in  the  left  and  right  part  of  (a).  The  lower  record  shows  the  photoresponse  while 
0.23ni4  current  was  being  injected  into  the  soma.  Adapted  from  Marchifava  (1979).  (b)  Simulated 
intracellular  potential  at  the  soma  of  the  reconstructed  rabbit  On-Off  direction  selective  ganglion 
cell  shown  in  Figure  4,  assuming  a  purely  passive  membrane.  The  two  distinct  peaks  correspond 
to  the  leading  edge,  receiving  On  input,  and  the  trailing  edge,  receiving  Off  input.  In  the  bottom 
half,  a  step  current  of  0.091n/4  was  being  injected  into  the  soma.  Preferred  direction  is  left  and 
null  direction  right.  From  Koch  el  al.  (1986b). 


These  cells  have  two  levels  of  dendritic  ramification.  This  observation  is  consistent  with 
studies  that  have  divided  the  inner  plexiform  layer  into  On  and  Off  laminae  (Famiglietti 
and  Kolb,  1976).  (2)  The  dendritic  branches  of  the  direction  selective  cells  are  of  very  small 


17 


Computations  Underlying  Motion 


Hildreth  K:  Koch 


diameter  relative  to  other  rabbit  ganglion  cells.  Moreover,  the  dendrites  carry  spines  or 
spine-like  structures.  (3)  The  dendritic  branching  pattern  is  quite  complex,  with  dendrites 
forming  apparent  loops.  Note  that  although  the  cell  drawn  in  Figure  4a  has  an  asymmetric 
placement  of  the  soma  with  respect  to  the  dendritic  tree,  preferred  and  null  direction  do 
not  appear  to  be  predictable  from  the  gross  dendritic  morphology  of  these  cells.  Thus,  the 
morphology  of  direction  selective  cells  agrees  well  with  previous  predictions  (Koch  et  al.. 
1982). 

In  order  to  model  massive  synaptic  input  to  a  direction  selective  ganglion  cell,  the 
passive  electrical  properties  of  the  anatomically  reconstructed  cell  shown  in  Figure  la  was 
simulated  on  the  basis  of  one-dimensional  cable  theory  (O'Donneil,  Koch  and  Poggio,  1986; 
Koch.  Torre  and  Poggio,  1986).  The  computation  of  the  voltages  is  carried  out  by  a  circuit 
simulation  program,  SPICE,  first  applied  to  biophysical  circuit  modeling  by  Segcv  et  al. 
( 1985).  Figure  3  shows  the  resulting  somatic  depolarization  in  the  absence  and  in  the  pres 
ence  of  a  depolarizing  current  step  injected  at  the  soma,  in  comparison  with  experimental 
records  obtained  from  turtle  ganglion  cells  (Marchiafava,  1979).  The  intracellular  potential 
can  also  be  displayed  in  color  throughout  the  entire  cell  (O’Donnell.  Koch  and  Poggio.  1986; 
Koch  et  al. ,  1986). 

Presynaptic  Circuitry  How  much  do  we  know  about  the  origin  and  properties  of  the  exci 
tatory  and  inhibitory  inputs  to  direction  selective  cells?  Considerable  evidence  implicates 
acetylcholine  (ACh)  as  the  excitatory  neurotransmitter  underlying  direction  selectivity  in 
the  rabbit  retina  (Ariel  and  Adolph,  1985).  If  all  synaptic  transmission  in  the  perfused 
retina  is  blocked  by  pharmacological  manipulation  of  the  bathing  medium,  On-Ofr  direc 
tion  selective  cells  can  be  driven  by  direct  application  of  ACh,  thus  implying  that  these 
cells  are  the  postsynaptic  target  for  cholinergic  synapses.  Ariel  and  Daw  ( 1982)  found  that 
upon  application  of  physostigmine,  a  drug  that  inhibits  the  hydrolysis  of  ACh  after  it  ha* 
bound  to  the  postsynaptic  membrane,  ganglion  cells  lose  their  ability  to  discriminate  mo¬ 
tion.  Other  properties  like  speed  and  size  specificity  and  radial  grating  inhibition  do  not 
seem  to  be  affected.  This  result  may  at  first  seem  paradoxical,  because  physostigmine  in 
creases  the  effectiveness  of  ACh.  One  simple  explanation  is  that  this  increased  effectiveness 
during  null  direction  serves  to  overcome  the  inhibition  and  to  initiate  action  potentials  at 
the  soma.  In  turtle  retina,  simliar  experiments  yield  similar  results  (Ariel  and  Adolph. 
1985). 

Recently,  Masland  and  colleagues  (Masland,  Mills  and  Cassidy,  1984;  Taurhi  and 
Masland,  1984)  identified  two  unique  populations  of  cholinergic  amacrine  cells.  In  the 
rabbit  retina,  the  only  cells  synthesizing  and  releasing  ACh  are  two  groups  of  amacrine  cell* 
distributed  in  the  On  and  Off  layers.  Using  radioactive  labeled  ACh.  Masland.  Mills  and 
Cassidy  demonstrated  that  these  two  subtypes  of  amacrine  cells  release  ACh  transiently 
either  at  the  onset  (cells  in  the  On  layer)  or  at  the  offset  of  light  (cells  in  the  Off  layer).  He 
cause  the  cells  have  a  unique  morphology  reminiscent  of  fireworks,  they  are  called  starbur.-t 
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amacrine  cells.  These  cells  appear  to  be  presynaptic  to  bistratified  ga.  glion  cells,  with  the 
morphological  attributes  of  the  direction  selective  cells  of  Amthor  et  al.,  (1984). 

The  inhibitory  input  for  motion  discrimination  is  believed  to  be  mediated  by  the  neu¬ 
rotransmitter,  ■)  -aminobutyric  acid  (GABA).  Caldwell  el  al.  (1978)  and  Ariel  and  Daw 
(1982)  infused  picrotoxin,  a  potent  antagonist  of  GABA,  into  the  rabbit  retina.  Within 
minutes  after  t  ho  start  of  drug  infusion,  the  response  of  direction  selective  cells  in  the  null 
direction  increased  dramatically,  so  that  theceli  became  equally  responsive  to  movement  in 
both  directions.  A  few  minutes  after  drug  infusion  was  discontinued,  the  cel!  again  became 
direction  selective.  In  the  turtle  retina,  direct  application  of  ACh  leads  to  spontaneous 
(b  ine:  in  direction  selective  cells,  during  blockage  of  synaptic  transmission  via  a  low  calcium 
concentration  and  KG TA  (Ariel  and  Adolph,  1985).  This  ACh-induced  spike  activity  can 
be  suppressed  by  GABA,  thus  indicating  that  both  ACh  and  GABA  receptors  must  coexist 
on  the  membrane  of  turtle  direction  selective  ganglion  cells.  In  the  rat  retina,  the  only  cells 
staining  for  glutamic  acid  decarboxylase  (GAD;  the  rate-limiting  enzyme  for  the  synthe¬ 
sis  of  GABA)  are  amacrine  cells  (Vaughn  et  al.,  1981).  These  cells  make  synapses  onto 
processes  of  bipolar,  amacrine  and  ganglion  cells  in  descending  order  of  frequency. 

Thus,  at  least  in  the  turtle  and  rabbit  retina,  the  excitatory  and  the  inhibitory  inputs  to 
direction  selective  ganglion  cells  appear  to  derive  from  cholinergic  and  GABAergic  amacrine 
cells.  This  finding  does  not  exclude,  however,  direct  input  from  bipolar  cells  that  may  be 
responsible,  for  instance,  for  the  center-surround  organization  of  direction  selective  cells. 

Alternative  Models  What  are  the  alternative  models  for  the  neuronal  operations  underly¬ 
ing  motion  discrimination?  If  one  assumes  that  direction  selectivity  is  first  expressed  at  the 
level  of  the  gangbon  cells  then  the  experimental  evidence  of  Barlow  and  Levick  (1965)  and 
the  intracellular  recordings  of  Marchiafava  (1979)  and  Watanabe  and  Murakami  (1984)  in 
conjunction  with  the  pharmacology  (Ariel  and  Adolph,  1985)  argue  in  favor  of  our  post- 
synaptic,  silent  inhibition  scheme.  Although  both  Werblin  (1970)  and  Marchiafava  (1979) 
have  failed  to  record  direction  selective  responses  in  bipolar  or  amacrine  cells,  the  possibility 
that  the  critical  computations  occur  presynaptic  to  the  ganglion  cell  cannot  be  excluded. 
Indeed,  DeVoe  and  his  collaborators  (DeVoe,  Guy  and  Criswell,  1985)  have  recorded  from 
direction  selective  amacrine  and  bipolar  cells  in  the  retina  of  the  turtle.  Their  evidence 
points  toward  an  alternative  or  coexistent  presynaptic  site  for  the  critical  computation  un¬ 
derlying  direction  selectivity  in  the  turtle.  A  second  piece  of  evidence  favoring  a  presynaptic 
arrangement  is  the  influence  of  GABA  on  ACh.  GABA  inhibits  the  light  evoked  release  of 
ACh  in  the  rabbit,  retina  (Massey  and  Neal,  1979;  see  Figure  4). 

Other  classes  of  presynaptic  models  for  motion  discrimination  have  been  proposed 
(Dowling,  1979;  Koch  and  Poggio,  1986;  Koch  et  al.,  1986):  Because  GABAergic  processes 
synapse  onto  bipolar,  amacrine,  and  ganglion  cells,  the  site  of  the  critical  computation  un¬ 
derlying  direction  selectivity  could  either  be  a  bipolar  cell  exciting  the  starburst  amacrine 
cell  or  the  starburst  amacrine  cell  itself.  Starburst  amacrine  cells  have  dendrites  that  are 
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Figure  4.  (a)  Camera  lucida  drawing  of  an  HRP-injected  On-Off  direction  selective  cell  in  the 
visual  streak  of  the  rabbit  retina.  The  dendritic  fields  have  been  drawn  in  two  parts:  “outer” 
refers  to  the  part  of  the  inner  plexiform  layer  (1PL)  closest  to  the  inner  nuclear  layer,  where 
the  cells  of  the  Off  pathway  make  synaptic  connections,  while  “inner"  is  the  layer  closest  to  the 
ganglion  cell  layer  where  the  the  On  pathway  is  connected.  There  are  no  obvious  asymmetries 
in  the  ceil  that  are  correlated  with  the  preferred  direction.  Adapted  from  Amthor  et  al.  (1984) 
(b)  A  simplified  schematic  of  the  excitatory  pathway  from  the  outer  plexiform  layer  (OPL)  to 
the  On-Off  direction  selective  ganglion  cell  in  the  rabbit.  Depolarizing  (On)  and  Hyperpolarizing 
(Off)  bipolar  cells  convey  the  visual  information  from  the  OPL  to  the  On  or  Off  part  of  the  IPL 
Here  they  most  likely  synapse  either  directly,  possibly  using  glutamate  or  aspartate  as  excitatory 
neurotransmitter,  or  indirectly,  via  other  amacrine  cells,  onto  the  cholinergic  starburst  amacrine 
cells.  These  amacrine  cells  feed  in  turn  directly  onto  the  bistratified  On-Off  ganglion  cells  (c) 
Possible  sites  for  the  computations  underlying  motion  discrimination.  GABAergic  amacrine  cells 
can  veto  the  excitatory  pathway  either  at  the  level  of  the  ganglion  cell  (1),  at  the  starburst  amacrine 
cells  (2)  or  bipolar  cells  (3).  Current  evidence  seems  to  favor  site  (1).  The  On  and  Off  pathways 
are  segregated  up  to  the  cell  body  of  the  On-Off  direction  selective  cell.  From  Koch  et  al.  (1986b) 
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probably  decoupled  from  each  other  and  the  soma  (Miller  and  Bloomfield,  1983).  Only 
the  distal-most  portion  of  the  dendrites  give  rise  to  conventional  chemical  synaptic  output, 
whereas  the  bipolar  and  amacrine  cell  input  is  distributed  throughout  the  cell  (Famiglietti, 
1983).  Thus,  each  dendrite  may  behave  from  an  electrical  point  of  view  as  an  independent 
subunit,  acting  as  the  morphological  basis  of  Barlow  and  Levick’s  subunits  (1965).  At  least 
two  biophysical  mechanisms  could  underlie  direction  selectivity:  1)  the  AND  NOT  veto 
scheme,  now  implemented  at  the  level  of  bipolar  or  amacrine  cells,  or  2)  a  linear  inter¬ 
action  between  an  excitatory  synapse  and  a  hyperpolarizing  synapse  followed  by  synaptic 
rectification  (Koch  and  Poggio,  1986).  In  this  case,  the  nonlinearity  essential  for  direction 
selectivity  (Poggio  and  Reichardt,  1973)  would  be  implemented  by  a  synaptic  transduc¬ 
tion  mechanism  that  only  allows  transmission  of  depolarizing  events.  For  these  presynaptic 
models,  the  release  of  neurotransmitter,  whether  from  the  bipolar  onto  the  amacrine  cell  or 
from  the  amacrine  onto  the  ganglion  cell,  would  in  itself  be  direction  selective. 

We  would  like  to  point  out  that  both  pre-  and  postsynaptic  models  may  turn  out  to  be 
correct.  For  instance,  the  direction  selective  bipolar  and  amacrine  cells  recorded  by  DeVoe 
et  al.  (1985)  have  a  smaller  velocity  range  than  direction  selective  ganglion  cells.  Thus, 
a  rough  estimate  of  the  direction  of  a  moving  stimuli  could  be  computed  at  the  level  of 
bipolar/amacrine  cells  while  ganglion  cells  would  perform  similar  but  finer  measurements. 

COMPUTING  MOTION  IN  THE  VISUAL  CORTEX  Much  more  work  has  been  done  on 
the  biophysical  mechanisms  underlying  direction  selectivity  in  the  retina  than  in  the  cortex. 
Therefore,  our  discussion  of  cortical  mechanisms  will  necessarily  be  brief.  As  mentioned 
above,  cells  in  the  primary  visual  cortex  of  cats  and  primates  are  likely  to  compute  the 
direction  of  motion,  because  the  geniculate  input  shows  no  evidence  of  direction  selectivity. 
Moreover,  if  the  inhibition  mediated  by  local  interneurons  is  removed  by  application  of 
bicuculline,  an  antagonist  of  GABA  (Sillito,  1977;  Sillito  et  al.,  1980),  direction  selectivity 
of  cortical  cells  is  severly  reduced  or  abolished.3.  This  experiment,  similar  to  Ariel  and 
Daw’s  experiment  in  the  retina  ( 1982),  underscores  the  importance  of  inhibition  for  direction 
discrimination. 

An  extension  of  the  veto  mechanism  outlined  above  has  been  proposed  to  underlie 
direction  selectivity  in  the  visual  cortex  (Poggio,  1982;  Koch  and  Poggio,  1985).  The  basic 
idea  is  as  follows:  A  single  LGN  On-center  neuron  (or  a  row  of  such  cells)  excites  a  cell  in 
area  VI  whenever  a  light-on  stimulus  falls  within  its  receptive  field  center.  A  neighboring 
On-center  LGN  cell  reduces  the  activity  of  the  cortical  neuron  by  a  delayed  silent  inhibition. 
Because  it  is  unlikely  that  LGN  cells  have  an  inhibitory  effect  on  their  postsynaptic  targets, 
the  second  geniculate  cell  excites  an  interneuron,  possibly  in  layer  4c,  which  in  turn  inhibits 
the  direction  selective  cell.  This  seems  plausible  in  light  of  the  fact  that  direction  selective 

3The  crucial  nature  of  inhibition  for  motion  discrimination  seems  to  be  well  preserved  across 
species.  Injecting  picrotoxin,  a  GABA  antagonist,  into  the  third  optic  gangiion  of  the  blowfly, 
Calltphora  Erylhrocephala,  abolishes  motion  discrimination  at  both  the  cellular  and  the  behav¬ 
ioral  level  (niilthofr  and  BiilthofF,  1986). 
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cells  in  the  primate  cortex  first  occur  one  synapse  beyond  layer  4c,  i.e.  in  layer  4b  (Dow. 
1974).  If  the  silent  inhibition  is  located  either  on  the  direct  path  between  excitation  and 
the  soma  or  very  near  the  excitatory  synapse,  it  will  effectively  veto  excitation  in  the  null 
direction.  Adding  a  similar  but  inverted  circuit  constructed  of  geniculate  Off-center  neurons 
endows  our  cortical  neuron  with  direction  selectivity  for  both  light-on  and  light-off  edges 
moving  in  the  same  direction  —  the  most  common  type  of  direction  selective  cell  (the  S2 
cell  of  Schiller  et  al.,  1976).  These  Off-center  neurons,  whose  receptive  field  overlap  with 
the  fields  of  their  On-center  counterparts,  map  onto  a  different  part  of  the  dendritic  tree 
of  the  direction  selective  cortical  cell.  This  prediction,  i.e.  that  direction  selectivity  for 
light-on  and  light-off  edges  results  from  the  independent  convergence  from  the  geniculate, 
is  supported  by  experiments  done  by  Schiller  (1982)  in  the  monkey  and  by  Sherk  and 
Horton  (1984)  in  the  cat,  using  the  pharmacological  agent  APB.  APB  infusion  into  the 
retina  reversibly  blocks  the  On  pathway  at  the  level  of  the  retinal  outer  plexiform  layer  and 
eliminates  the  response  of  the  cortical  direction  selective  cell  to  light  edges  while  leaving 
the  response  to  dark  edges  intact. 

One  intriguing  possibility  is  that  dendritic  spines  might  be  the  specialized  sites  for 
the  synaptic  veto  operation  to  take  place.  5  -  20%  of  spines  on  cortical  cells  have  been 
reported  to  carry  symmetrical  and  asymmetrical  synaptic  profiles  on  the  same  spine  (see, 
for  instance,  Jones  and  Powell,  1969;  Sloper  and  Powell,  1979).  Such  an  arrangement  can 
be  used  to  perform  a  highly  tuned  temporal  discrimination  operation,  essentially  without 
influencing  the  rest  of  the  neuron  (Koch  and  Poggio,  1983).  With  a  fast  excitatory  and 
a  much  slower  inhibitory  conductance  change  simultaneously  occuring  on  the  same  spine, 
inhibition  will  effectively  veto  excitation  if  it  sets  in  before  the  start  of  excitation  (null  direc¬ 
tion).  Activating  the  inhibition  some  fraction  of  a  millisecond  after  the  start  of  excitation 
will  not  influence  excitation  to  any  significant  degree  (preferred  direction). 

Very  recently,  Saito  et  al.  (1986)  have  proposed  that  a  more  complex  type  of  motion 
discrimination,  namely  cells  in  the  superior  temporal  sulcus  of  the  macaque  monkey  that 
respond  only  to  either  expanding  or  contracting  size  change  of  patterns  or  to  rotation  of 
patterns  in  one  direction,  is  based  on  local  synaptic  veto  operations  occurring  at  numerous 
independent  sites  in  the  dendritic  tree  of  these  cells.  Finally,  it  has  recently  been  proposed 
that  the  synaptic  veto  mechanism  underlies  the  direction  selective  response  to  cells  in  the 
somatosensory  cortex  of  awake  monkeys  when  wheels  with  surface  gratings  are  rolled  over 
their  skin  (Warren  et  al.,  1986). 

Open  Questions 


Evidence  still  seems  inadequate  to  present  a  clear  cut  case  for  either  the  correlation 
or  the  gradient  scheme  for  human  motion  discrimination.  In  fact,  both  schemes  may  be 
used  by  the  human  visual  system.  Because  the  physiological  and  behavioral  data  seems  to 
indicate  the  validity  of  the  correlation  model  for  invertebrates  and  a  large  class  of  vertebrates 
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it  may  be  hypothesized  that  the  Reichardt  correlation  model,  possibly  implemented  via 
the  synaptic  veto  mechanisms  of  Torre  and  Poggio  (1978),  is  used  in  the  primate  retina 
to  endow  some  cells  with  direction  selectivity.  These  cells,  which  cannot  exist  in  very 
large  numbers,  project  to  the  superior  colliculus  and  from  there  possibly  to  the  cortex. 
Motion  discrimination  in  the  cortex  could  be  computed  de  nouveau  within  simple  cells  in 
the  striate  cortex  by  use  of  a  different  scheme,  for  instance  the  gradient  scheme  of  Marr 
and  Ullman  (1981)  or  the  implementation  of  the  correlation  model  based  on  AND-NOT 
type  of  synaptic  logic  (Poggio,  1982;  Koch  and  Poggio,  1985).  Psychophysical  experiments 
may  thus  be  unable  to  separate  these  two  models.  Clearly,  what  is  needed  are  physiological 
experiments,  e.g.  single  cell  recordings  using  some  of  the  psychophysical  paradigms,  to 
identify  unambiguously  the  algorithm  used  to  detect  motion. 

In  the  section  on  the  biophysical  mechanisms  possibly  underlying  direction  selectivity, 
we  discussed  the  strengths  and  limitations  of  simulating  biophysical  hardware,  that  is  neu¬ 
rons.  Modeling  the  events  underlying  a  particular  computation  at  the  cellular  level  can  give 
us  valuable  insights  into  the  elementary  operations  underlying  information  processing  at 
the  single  cell  level,  operations  that  cannot  be  resolved  by  present  experimental  techniques 
because  of  the  small  distances  and  the  brief  times  involved.  Thus,  the  major  justification 
of  this  approach  is  its  predictive  power.  Computer  simulations  should  provide  a  number  of 
detailed  predictions  that  can  be  evaluated  experimentally.  Ideally,  these  predictions  should 
be  nontrivial  and  should  rule  out  alternative  explanations. 

The  major  drawback  of  this  approach  is  that  any  model  is  only  as  good  as  its  funda¬ 
mental  assumptions.  For  instance,  most  of  the  studies  addressing  properties  of  the  synaptic 
veto  operation  assume  the  absence  of  any  significant  electrical  nonlinearity,  sue  .  as  den¬ 
dritic  spikes.  This  proviso  must  be  taken  into  account  when  comparing  experiments  with 
the  theoretical  predictions,  and  the  effect  of  this  simplifying  assumption  on  the  mechanism 
in  question  must  be  carefully  assessed  (see  O’Donnell,  Koch  and  Poggio,  1985).  Biophys¬ 
ical  models  of  the  electrical  properties  of  neurons  depend  on  a  host  of  parameters  and 
assumptions,  mo6t  of  which  are  poorly  characterized.  Thus,  the  foremost  requirement  of 
any  detailed  model  of  cellular  properties  must  be  robustness:  varying  some  parameter,  such 
as  the  membrane  resistance,  by  a  given  amount  should  not  lead  to  drastically  changed  prop¬ 
erties  in  the  circuit  except  if  some  critical,  and  specified,  value  has  been  crossed.  Ideally, 
one  would  like  to  show  that  some  particular  behavior  occurs  for  a  broad  range  of  parameters 
and  is  not  overly  sensitive  to  any  one  of  them.  If  the  model’s  behavior  varies  dramatically 
by  changing  a  parameter,  for  instance  the  location  of  inhibition  with  respect  to  excitation, 
this  dependency  should  be  studied  carefully,  because  it  may  lead  to  interesting  predictions. 
Any  model  that  overly  constrains  a  parameter  seems  biologically  unreasonable. 
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THE  INTEGRATION  OF  EARLY  MOTION  MEASUREMENTS 

Solmng  the  Aperture  Problem 

The  motion  detection  mechanisms  described  in  the  preceding  section  provide  only  par¬ 
tial  information  about  the  2-D  pattern  of  movement  in  the  changing  image,  due  to  a  problem 
often  referred  to  as  the  aperture  problem  (Wallach,  1976;  Fennema  and  Thompson.  1979; 
Burt  and  Sperling,  1981;  Horn  and  Schunck,  1981;  Marr  and  Ullman,  1981;  Adelson  and 
Movshon,  1982).  Consider  the  computation  of  the  projected  2-D  velocity  field  for  the  ro¬ 
tating  wireframe  object  illustrated  in  Figure  5a.  Suppose  that  the  movement  of  features 
on  the  object  were  first  detected  by  using  operations  that  examine  only  a  limited  area  of 
the  image,  such  as  those  performed  by  neural  mechanisms  with  spatially  limited  receptive 
fields.  The  information  provided  by  such  mechanisms  is  illustrated  in  Figure  5b.  The  ex¬ 
tended  edge  E  moves  across  the  image,  and  its  movement  is  observed  through  a  window 
defined  by  the  circular  aperture  A.  Through  this  window,  it  is  only  possible  to  observe  the 
movement  of  the  edge  in  the  direction  perpendicular  to  its  orientation.  The  component  of 
motion  along  the  orientation  of  the  edge  is  invisible  through  this  limited  aperture.  Thus  it 
is  not  possible  to  distinguish  between  motions  in  the  directions  b,  c  and  d.  This  failure  to 
distinguish  between  motions  when  the  object  is  viewed  through  a  small  window  has  been 
referred  to  as  the  aperture  problem,  and  is  inherent  in  any  motion  detection  operation  that 
examines  only  a  limited  area  of  the  image. 

As  a  consequence  of  the  aperture  problem,  the  measurement  of  motion  in  the  changing 
image  requires  two  stages  of  analysis:  the  first  stage  measures  components  of  motion  in 
the  direction  perpendicular  to  image  features;  the  second  combines  these  components  of 
motion  to  compute  the  full  2-D  pattern  of  movement  in  the  image.  In  Figure  5c,  a  circle 
undergoes  pure  translation  to  the  right.  The  arrows  along  the  contour  represent  the  per¬ 
pendicular  components  of  velocity  that  can  be  measured  directly  from  the  changing  image. 
These  component  measurements  each  provide  some  constraint  on  the  possible  motion  of 
the  circle,  as  illustrated  in  Figure  5d.  The  bold  vector  v  represents  the  local  perpendicular 
component  of  motion  at  a  particular  location  in  the  image.  The  possible  true  motions  at 
that  location  are  given  by  the  set  of  velocity  vectors  whose  endpoint  lies  along  the  line  / 
oriented  perpendicular  to  the  vector  v.  Examples  of  possible  true  velocities  are  indicated 
by  the  dotted  vectors.  The  movement  of  image  features  such  as  corners  or  small  spots  can 
be  measured  directly.  In  general,  however,  the  first  measurements  of  movement  provide 
only  partial  information  about  the  true  movement  of  features  in  the  image,  and  must  be 
combined  to  compute  the  full  pattern  of  2-D  motion. 

The  measurement  of  movement  is  difficult  because  in  theory,  there  are  infinitely  many 
patterns  of  motion  that  are  consistent  with  a  given  changing  image.  For  example,  in  Figure 
5e,  the  contour  C  rotates,  translates  and  deforms  to  yield  the  contour  C'  at  some  later  time. 
The  true  motion  of  the  point  p  is  ambiguous.  Additional  constraint  is  required  to  identify  a 
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Figure  5.  The  aperture  problem  in  motion  measurement,  (a)  On  the  top  are  three  views  of  a 
wireframe  object  undergoing  rotation  around  a  central  vertical  axis.  On  the  bottom,  the  arrows 
along  the  contours  of  the  object  represent  the  instantaneous  velocity  field  at  one  position  in  the 
object’s  trajectory.  For  simplicity,  an  orthographic  projection  is  used,  (b)  An  operation  that  views 
the  moving  edge  E  through  the  local  aperture  A  can  compute  only  the  component  of  motion  c  in  the 
direction  perpendicular  to  the  orientation  of  the  edge.  The  true  motion  of  the  edge  is  ambiguous, 
(c)  The  circle  undergoes  pure  translation  to  the  right;  the  arrows  represent  the  perpendicular 
components  of  velocity  that  can  be  measured  from  the  changing  image,  (d)  The  curve  C  rotates, 
translates,  and  deforms  over  time  to  yield  the  curve  C'.  The  velocity  of  the  point  p  is  ambiguous, 
(e)  The  vector  v  represents  the  perpendicular  component  of  velocity  at  some  location  in  the  image. 
I  he  true  velocity  at  that  location  must  project  to  the  line  /  perpendicular  to  v;  examples  are  shown 
with  dolled  arrows. 
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unique  solution.4  It  should  also  be  noted  that  in  general,  it  may  not  be  possible  to  recover 
the  2-D  projection  of  the  true  3-D  field  of  motions  of  points  in  space,  from  the  changing 
image  intensities.  Factors  such  as  changing  illumination,  specularities.  and  shadows  can 
generate  patterns  of  optical  flow  in  the  image  that  do  not  correspond  to  the  real  movement 
of  surface  features.  The  additional  constraint  used  to  measure  image  motion  can  yield  at 
best  a  solution  that  is  most  plausible  from  a  physical  standpoint. 

Many  physical  assumptions  could  provide  the  additional  constraint  needed  to  compute  a 
unique  pattern  of  image  motion.  One  possibility  is  the  assumption  of  pure  translation.  That 
is,  it  is  assumed  that  velocity  is  constant  over  small  areas  of  the  image.  This  assumption  has 
been  used  both  in  computer  vision  studies  and  in  biological  models  of  motion  measurement 
(for  example,  Lappin  and  Bell,  1976;  Pantle  and  Picciano,  1976;  Fennema  and  Thompson, 
1979;  Anstis,  1980;  Marr  and  Ullman,  1981;  Thompson  and  Barnard,  1981;  Adelson  and 
Movshon,  1982).  Methods  that  assume  pure  translation  may  be  used  to  detect  sudden 
movements  or  to  track  objects  across  the  visual  field.  These  tasks  may  require  only  a  rough 
estimate  of  the  overall  translation  of  objects  acrces  the  image.  Tasks  such  as  the  recovery  of 
3-D  structure  from  motion  require  a  more  detailed  measurement  of  relative  motion  in  the 
image.  The  analysis  of  variations  in  motion  such  as  those  illustrated  in  Figure  5a  requires 
the  use  of  a  more  general  physical  assumption. 

Davis,  Wu  and  Sun  (1983)  proposed  a  computational  method  for  solving  the  aperture 
problem  that  assumes  that  the  pattern  of  image  motion  can  be  approximated  locally  by 
rigid  motion  in  the  image  plane.  In  more  recent  studies,  the  local  image  motions  have  been 
modeled  by  second-order  polynomials  in  the  image  coordinates  (Wohn,  1984;  Waxman  ami 
Wohn,  1985;  Wohn  and  Waxman,  1985;  Waxman,  1986).  This  approach  implicitly  assumes 
that  the  image  locally  represents  the  projection  of  a  quadric  surface  patch  in  motion. 

Other  computational  studies  have  assumed  that  velocity  varies  smoothly  across  the 
image  (Horn  and  Schunck,  1981;  Hildreth,  1984;  Nagel,  1984;  Nagel  and  Enkelmann.  1981. 
1986;  Anandan  and  Weiss,  1985;  Scott,  1986).  The  assumption  rests  on  the  principle  that 
physical  surfaces  are  generally  smooth;  that  is,  variations  in  the  structure  of  a  surface  are 
usually  small,  compared  with  the  distance  of  the  surface  from  the  viewer.  When  surfaces 
move,  nearby  points  tend  to  move  with  similar  velocities.  There  exist  discontinuities  in 
movement  at  object  boundaries,  but  most  of  the  image  is  the  projection  of  relatively  smooth 
surfaces.  Thus,  it  is  natural  to  assume  that  image  velocities  vary  smoothly  over  most  of  the 
visual  field.  A  unique  pattern  of  movement  can  be  obtained  by  computing  a  velocity  field 
that  is  consistent  with  the  changing  image  and  has  the  least  amount  of  variation  possible. 
In  other  words,  a  pattern  of  movement  is  derived  for  which  nearby  points  in  the  image  move 
with  velocities  that  are  as  similar  as  possible. 

The  use  of  the  smoothness  assumption  for  motion  measurement  has  several  important 
attributes  from  a  computational  perspective.  First,  it  allows  general  motion  to  be  analyzed. 

4Like  many  early  vision  problems,  the  measurement  of  motion  is  an  ill  po.seJ  problem,  as  for¬ 
malized  by  Hadamard  (Poggio,  Torre  and  Koch,  1985).  A  body  of  mathematics  known  as 
regularization  theory  may  serve  to  unify  the  solution  to  many  ill  posed  problems  in  vision 
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Surfaces  can  be  rigid  or  nonrigid,  undergoing  any  movement  in  space.  It  is  always  possible 
to  compute  a  projected  velocity  field  that  preserves  the  variation  in  the  local  pattern  of 
movement.  Second,  the  smoothness  assumption  can  be  embodied  in  the  motion  measure¬ 
ment  computation  in  a  way  that  guarantees  a  unique  solution  (Hildreth,  1984).  Third,  the 
velocity  field  of  least  variation  can  be  computed  straightforwardly,  using  standard  computer 
algorithms  (Horn  and  Schunck,  1981;  Hildreth,  1984;  Nagel  and  Enkelmann,  1984,  1986; 
Anandan  and  Weiss,  1985),  as  well  as  simple  analog  resistive  networks  (Poggio,  Torre  and 
Koch,  1985;  Poggio  and  Koch,  1985). 

From  the  perspective  of  perceptual  psychology,  one  can  ask  whether  the  human  visual 
system  derives  patterns  of  movement  that  are  consistent  with  those  predicted  by  a  computa¬ 
tion  that  uses  the  smoothness  assumption.  In  particular,  one  can  ask  whether  an  incorrect 
pattern  of  motion  is  perceived  in  situations  in  which  a  computer  algorithm  also  fails.  The 
method  for  computing  the  velocity  field  suggested  by  Hildreth  (1984)  is  guaranteed  to  yield 
the  correct  solution  for  at  least  two  classes  of  motion:  (1)  pure  translation,  and  (2)  general 
motion  (translation  and  rotation)  of  rigid  3-D  objects  whose  edges  are  essentially  straight. 
For  example,  the  computation  yields  the  correct  velocity  field  for  the  moving  object  of 
Figure  5a.  For  smooth  curves  undergoing  rotation,  this  computation  sometimes  yields  a 
solution  that  differs  from  the  correct  projected  velocity  field.  The  human  visual  system  also 
appears  to  derive  an  incorrect  perception  of  motion  in  these  situations  (Hildreth,  1984). 
Comparisons  between  the  results  of  computational  model;s  and  perceptual  behavior  have 
so  far  been  only  qualitative,  however.  Open  questions  remain  regarding  whether  the  hu¬ 
man  visual  system  maintains  a  local  representation  of  the  pattern  of  image  motions,  and 
whether  perceived  motion  is  quantitatively  consistent  with  that  expected  from  a  compu¬ 
tation  that  uses  the  smoothness  constraint.  Perceptual  studies  indicate  that  when  visual 
patterns  undergo  uniform  translation,  human  observers  can  match  velocity  directions  to  a 
resolution  of  about  1°  (Levinson  and  Sekuler,  1976;  Nakayama  and  Silverman,  1983).  It  is 
not  yet  known,  however,  whether  such  precision  of  velocity  direction  is  also  obtained  when 
the  velocity  field  varies  continuously  across  the  visual  field. 

A  second  issue  that  arises  regarding  the  solution  to  the  aperture  problem  is  the  question 
of  whether  the  early  motion  measurements  are  integrated  over  2-D  areas  of  the  image 
or  along  connected  contours  such  as  edges.  Models  such  as  that  suggested  by  Horn  and 
Schunck  (1981)  integrate  these  measurements  over  areas,  while  the  model  proposed  by 
Hildreth  (1984)  integrates  motion  measurements  along  connected  contours.  This  issue  was 
addressed  in  a  recent  perceptual  study  by  Nakayama  and  Silverman  (1984a).  Their  study 
used  a  simple  distorted  line,  oscillating  up  and  down.  When  viewed  alone,  a  central  diagonal 
section  of  the  line  appeared  to  move  in  an  oblique  direction,  so  that  the  entire  figure 
appeared  nonrigid.  The  figure  could  be  made  to  appear  to  move  rigidly  up  and  down 
by  the  introduction  of  additional  features  that  were  unambiguously  moving  up  and  down. 
Nakayama  and  Silverman  introduced  both  breaks  on  the  contour  and  short  segments  off  the 
contour.  They  found  that  both  the  breaks  on  the  line  and  the  segments  off  the  line  could 
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cause  the  central  part  of  the  line  to  appear  to  move  up  and  down,  but  the  features  on  the 
contour  had  a  much  stronger  effect,  in  that  their  distance  from  the  center  could  be  very  largo. 
The  segments  had  to  be  very  close  to  the  line  in  order  to  exert  any  influence  on  the  perception 
of  its  motion.  These  phenomena  suggest  that  the  integration  of  motion  constraints  along 
contours  may  play  a  stronger  role  in  the  human  visual  system,  an  observation  that  is  also 
supported  by  perceptual  demonstrations  presented  by  Hildreth  (1984). 

The  local  perpendicular  components  of  motion  are  not  always  combined  by  the  human 
visual  system.  The  conditions  governing  whether  or  not  these  measurements  are  combined 
were  studied  by  Adelson  and  Movshon  (1982)  and  by  Nakayama  and  Silverman  (1983). 
In  the  Adelson  and  Movshon  study,  the  stimulus  patterns  consisted  of  two  superimposed 
sinewave  gratings  at  different  orientations,  moving  in  the  direction  perpendicular  to  their 
orientations.  Together,  the  two  gratings  formed  a  single  rigid  pattern,  moving  in  a  direction 
consistent  with  the  constraints  imposed  by  the  two  components.  Under  some  conditions, 
the  gratings  did  not  form  a  single  coherent  pattern  perceptually;  rather,  the  two  compo¬ 
nents  appeared  to  split  and  move  independently  of  one  another.  The  coherence  of  the 
combined  pattern  was  found  to  decrease  with  an  increase  in  any  of  the  following  factors: 
(1)  the  difference  in  contrast  between  the  two  gratings,  (2)  the  angle  between  the  primary 
directions  of  the  gratings,  (3)  the  difference  between  the  two  spatial  frequencies  and  (4) 
the  speed  of  movement  of  the  overall  pattern.  In  a  later  study  by  Adelson  (1984),  it  was 
shown  that  the  two  components  of  motion  would  also  appear  to  split  if  they  were  presented 
on  different  depth  planes.  This  observation  suggests  that  stereo  disparity  enters  into  the 
solution  to  the  aperture  problem  in  motion.  Nakayama  and  Silverman  (1983),  by  using 
stimuli  consisting  of  sinewave  lines,  demonstrated  that  two  components  of  motion  tend  not 
to  be  combined  if  their  orientations  are  very  similar  (i.e.  they  differ  by  at  most  about  30°). 
These  perceptual  studies  suggest  that  early  measurements  of  the  perpendicular  components 
of  motion  are  not  always  combined  by  the  human  visual  system.  Under  some  conditions, 
they  will  remain  separate,  resulting  in  a  perception  of  motion  that  corresponds  directly  to 
the  pattern  of  components.  More  generally,  these  studies  provide  implicit  support  for  the 
notion  that  motion  measurement  takes  place  in  two  stages,  with  the  first  stage  providing  the 
perpendicular  components  of  motion  and  the  second  stage  combining  these  components  into 
a  single  coherent  pattern  of  motion.  More  explicit  psychophysical  support  for  a  two -stage 
motion  measurement  computation  is  presented  in  Movshon  et  al.  (1985). 

The  motion  measurement  problem  can  also  be  examined  from  a  physiological  perspec¬ 
tive.  Early  movement  detectors  in  biological  systems  have  spatially  limited  receptive  fields 
and  therefore  face  the  aperture  problem.  Stimulated  by  a  theoretical  analysis  of  the  aper¬ 
ture  problem,  Movshon  et  al.  (1985)  sought  and  found  direct  physiological  evidence  for 
a  two-stage  motion  measurement  computation  in  the  primate  visual  system.  Two  visual 
areas  that  include  an  abundance  of  motion -sensitive  neurons  are  cortical  areas  VI  and  the 
middle  temporal  area  of  extrastriate  cortex  (MT),  located  in  the  posterior  bank  of  the  su¬ 
perior  temporal  sulcus  (for  example,  see  Maunsell  and  Van  Essen,  1983:  Van  Essen  and 
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Maunsell,  1983;  Allman,  Miezin  and  McGuinness,  1985;  Saito  et  al.,  1986).  The  explicit 
role  of  area  MT  in  the  cortical  analysis  of  visual  motion  was  confirmed  recently  by  Newsome 
et  al.  (1985),  who  showed  that  small  restricted  chemical  lesions  in  area  MT  of  the  macaque 
monkey  led  to  a  behavioral  deficit  in  the  monkey’s  ability  to  match  the  velocity  of  smooth 
pursuit  eye  movements  with  the  velocity  of  visual  targets.  Moreover,  lesions  in  the  cat’s 
Claire-Bishop  area,  which  is  assumed  to  correspond  to  area  MT  in  the  macaque  anatom¬ 
ically,  led  to  a  much  reduced  ability  of  behaving  cats  to  distinguish  small  moving  figures 
from  both  moving  and  stationary  surrounds  (Strauss  and  von  Seelen,  1986).  Movshon  et 
al.  (1985)  explored  the  type  of  motion  analysis  taking  place  in  the  primate’s  MT,  by  us¬ 
ing  the  same  stimulus  with  superimposed  sinewave  gratings  used  by  Adelson  and  Movshon 
(1982).  The  results  of  these  experiments  indicate  that  the  selectivity  of  neurons  in  area  Vl 
for  direction  of  movement  is  such  that  they  could  provide  only  the  component  of  motion 
in  the  direction  perpendicular  to  the  orientation  of  image  features.  These  neurons  essen¬ 
tially  only  respond  to  a  single  component  of  the  combined  grating  pattern,  independent  of 
the  presence  of  the  second  grating.  Area  MT,  however,  contains  a  subpopulation  of  cells, 
referred  to  as  pattern  cells,  that  appear  to  respond  to  the  2-D  direction  of  motion  of  the 
combined  grating  pattern.  For  example,  imagine  a  sinewave  grating  moving  diagonally  up 
this  page  (bottom  left  to  top  right)  and  a  second  pattern  superimposed  on  the  first,  moving 
diagonally  down  the  page  (top  left  to  bottom  right).  A  neuron  in  VI  whose  best  direction 
id  diagonally  upward  would  respond  to  the  superimposed  pattern,  as  though  the  downward 
moving  diagonal  were  not  even  present.  A  pattern  cell  in  MT,  however,  would  respond  to 
the  superimposed  patterns  as  though  they  were  moving  directly  across  the  page  from  left 
to  right.  Thus,  these  pattern  cells  may  serve  to  combine  motion  components  to  compute 
the  real  2-D  direction  of  velocity  of  a  moving  pattern.  These  experiments  do  not  yet  dis¬ 
tinguish  between  the  use  of  the  simple  assumption  of  pure  translation,  as  suggested  in  the 
study  by  Movshon  et  al..  1985,  and  a  more  general  assumption  such  as  smoothness.  Stimu¬ 
lus  patterns  undergoing  more  complicated  motions  are  required  to  make  such  a  distinction. 
If  the  pattern  cells  in  area  MT  employ  the  assumption  of  smoothness  in  their  computation 
of  motion,  one  would  expect  to  find  direct  interaction  between  pattern  cells  that  analyze 
nearby  areas  of  the  visual  field. 

Poggio  and  Koch  (1985;  Poggio  et  al.,  1985)  presented  hypothetical  neural  implemen¬ 
tations  of  regularization  algorithms  in  terms  of  very  simple  linear,  electrical  or  chemical, 
analog  networks.  In  particular,  they  proposed  an  implementation  for  the  computation  of 
the  smoothest  velocity  field  as  suggested  by  Hildreth  (1984).  From  these  networks,  a  neural 
circuit  is  then  designed  that  behaves  in  a  similar  way.  Examples  of  the  electrical  and  neural 
networks  are  shown  in  Figure  6.  In  the  network  of  Figure  6a,  the  currents  I,  and  conduc¬ 
tances  g  and  g,  represent  measurements  of  the  perpendicular  components  of  velocity  and 
other  properties  of  a  moving  contour  obtained  directly  from  the  image.  The  voltages  V,  rep¬ 
resent  the  tangential  component  of  velocity  (i.e.  the  component  of  velocity  in  the  direction 
parallel  to  the  orientation  of  features  in  the  image)  that  is  recovered  by  the  computation 
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of  the  full  2-D  velocity  field.  These  analog  resistive  networks  allow  a  fast  computation  of 
the  smoothest  velocity  field  and  are  guaranteed  to  converge  to  the  correct  solution  (Poggio 
and  Koch,  1985).  In  the  corresponding  neural  implementation  of  Figure  6b,  the  tangential 
component  of  the  velocity  field  is  represented  by  the  voltages  V;  along  a  dendrite,  which 
are  sampled  by  dendro-dendritic  synapses.  Measurements  from  the  image  are  represented 
by  synaptically  mediated  current  injections  I,  and  other  synaptic  inputs  R,  (for  instance, 
a  silent  GABA.*  type  inhibitory  synapse)  that  control  the  membrane  resistance.  The  full 
2-D  velocity  field  is  represented  implicitly  by  the  combination  of  the  currents  I,  and  the 
voltages  V,.  This  hypothetical  neural  implementation  was  not  intended  as  a  specific  model 
for  the  measurement  of  motion  in  area  MT.  Rather,  its  intent  was  to  show  that  it  is  possi¬ 
ble  for  neural  hardware  to  exploit  a  model  of  this  computation  that  incorporates  a  general 
assumption  such  as  smoothness  of  the  velocity  field.  Models  such  as  this  can  help  to  focus 
experimental  questions  regarding  the  actual  neural  circuitry  in  areas  such  as  MT. 

Long  Range  Motion  Correspondence 

The  preceding  section  addressed  computational  models  that  might  underlie  the  short- 
range  process.  The  computation  of  a  velocity  field  requires  that  motion  in  the  image  be 
roughly  continuous.  The  perception  of  motion  by  the  human  visual  system  does  not.  how¬ 
ever,  require  that  objects  move  continuously  across  the  visual  field.  Motion  can  be  inferred 
when  features  are  presented  discretely  at  positions  separated  by  up  to  several  degrees  of 
visual  angle  and  with  long  temporal  intervals  between  presentations.  Them  are  many  visual 
patterns  that  yield  qualitatively  different  perceptions  of  motion,  depending  on  the  size  of 
the  spatial  and  temporal  displacements  between  frames  (for  example,  Temus.  1926;  Anstis. 
1970,  1980;  Braddick,  1974,  1980;  Anstis  and  Rogers,  1975:  Pantle  and  Picciano.  1976; 
Petersik  and  Pantle,  1979;  Shepard  and  Judd,  1976;  Burt  and  Sperling,  1981;  Green  and 
von  Griinau,  1983;  Hildreth,  1984;  Anstis  and  Mather,  1985).  Although  ihe  short  rang* 
and  long-range  motion  processes  may  interact  at  some  stage  (Clatworthy  and  ITisby,  1973: 
Green  and  von  Griinau,  1983),  there  is  evidence  that  they  are  initially  distinct  processes 
(Mather,  Cavanagh  and  Anstis,  1985;  Gregory.  1985;  Anstis  and  Mather.  19*5). 

The  long-range  motion  phenomena  illustrate  the  ability  of  the  human  visual  system  to 
derive  a  correspondence  between  elements  in  the  changing  image,  over  considerable  distances 
and  temporal  intervals.  Under  these  conditions,  there  is  no  continuous  motion  of  elements 
across  the  image  to  be  measured  directly.  A  correspondence  computation  is  therefore  likely 
to  underlie  the  long-range  motion  process.  Two  issues  arise  regarding  this  computation: 
first,  what  features  in  the  image  are  matched  from  one  moment  to  the  next,  and  second,  Imw 
is  a  unique  correspondence  of  features  established?  Similar  to  t  lie  velocity  field  computation, 
many  possible  matchings  between  features  in  two  images  exist,  and  additional  constraints 
must  be  imposed  to  compute  a  single  correspondence  that  is  most  plausible  front  a  physical 
standpoint. 
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Figure  6  Analog  models  of  the  velocity  field  computation,  (a)  A  simple  resistive  network  that 
computes  the  smoothest  velocity  field.  The  conductances  g  and  g0  and  the  currents  I,  represent 
properties  of  a  moving  contour  that  are  measured  directly  from  the  image.  In  particular,  g,  is 
proportional  to  the  square  of  the  contrast  of  the  contour  at  location  i.  The  2-D  velocity  field  along 
the  contour  is  represented  implicitly  by  the  combination  of  these  inputs  and  the  resulting  voltages 
V,.  (b)  A  hypothetical  neural  implementation  of  the  circuit  shown  in  (a)  Synaptic  mediated 
currents  Ij,  and  additional  inputs  R,  (possibly  a  GABA*  type  of  synapse)  represent  properties 
of  a  moving  contour.  The  resulting  voltages  V,,  sampled  by  dendro-dendritic  synapses,  together 
with  the  input  currents,  represent  local  velocities  along  the  contour.  Redrawn  from  Poggio  and 
Koch  (1985). 


The  possible  image  features  that  could  form  the  matching  elements  span  a  wide  range, 
from  simple  edge  and  line  segments,  points,  and  blobs,  to  texture  boundaries,  subjective 
contours  and  groups  of  primitive  features,  and  even  to  structured  forms  or  entire  objects. 
Motion  measurement  schemes  used  in  computer  vision,  reviewed  for  example  in  Thompson 
and  Barnard  (1981),  Ullman  (1981a)  and  Barron  (1984),  have  considered  most  of  these 
possible  matching  elements.  In  general,  the  earlier  tokens  such  as  edge  and  line  segments 
are  easier  to  compute,  but  there  is  greater  ambiguity  in  the  matching  of  these  tokens  from 
one  moment  to  the  next.  The  use  of  primitive  tokens  also  allows  the  correspondence  process 
to  operate  on  arbitrary  objects  undergoing  complex  chape  changes.  More  complex  tokens 
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such  as  structured  forms  can  simplify  the  correspondence  process,  but  more  computation  is 
required  to  extract  these  features  from  the  image,  and  there  is  less  flexibility  in  the  types 
of  motion  that  can  be  analyzed. 

Perceptual  studies  suggest  that  many  long  range  motion  phenomena  can  be  explained 
in  terms  of  a  correspondence  of  elements  such  as  edges,  bars,  line  terminations,  points, 
etc.  (Ullman,  1979).  The  human  visual  system  can  also  establish  a  correspondence  be¬ 
tween  groups  of  primitive  elements  even  when  the  constituents  of  the  groups  are  not  the 
same  (Riley,  1981),  subjective  contours  and  texture  boundaries  (Ramachandran.  Rao  and 
Vidyasagar,  1973;  Riley,  1981)  and  subjective  surfaces  (Ramachandran,  1985).  Properties 
of  primitive  elements  such  as  orientation,  contrast  and  size  c.ut  influence  the  correspondence 
computation  (for  example,  Frisby,  1972;  Kolers,  1972;  Ullman,  1979.  1981b),  although  it 
is  possible  to  establish  a  correspondence  between  objects  that  differ  significantly  in  their 
components  (Navon,  1976;  Anstis  and  Mather,  1985).  Chen  (1985)  has  suggested  that 
topological  features  such  as  connectivity,  closure  and  the  presence  of  holes  can  play  a  role 
in  motion  correspondence,  but  it  is  not  clear  whether  these  properties  are  made  explicit  in 
the  description  of  the  matching  elements,  or  whether  they  are  reflected  in  the  constraints 
that  are  used  to  establish  a  unique  correspondence  of  elements  between  frames. 

The  rules  or  constraints  that  are  used  by  the  human  visual  system  to  establish  a  cor¬ 
respondence  of  elements  between  frames  have  also  been  explored  in  many  studies.  Early 
perceptual  studies  focused  on  the  role  of  the  time  and  distance  between  elements  in  sue 
cessive  frames  (for  example,  Ternus,  1926;  Kolers,  1972;  Burt  and  Sperling.  1981).  When 
the  elements  in  motion  are  isolated  dots,  each  dot  in  general  ‘prefers’  to  match  its  near 
est  neighbor  in  the  subsequent  frame,  although  this  constraint  sometimes  can  be  violated 
locally  when  a  field  of  dots  in  motion  interacts  (Ullman,  1979;  Burt  and  Sperling,  1981  ). 
The  distance  metric  that  is  used  in  the  correspondence  process  appears  to  be  based  on  2  I) 
distances  between  elements  rather  than  3-D  distances  (Ullman,  1979;  Mutch,  Smith  and 
Yonas,  1983;  Tarr  and  Pinker,  1985).  Ramachandran  and  Anstis  (1983,  1985)  showed  that 
inertia’  can  influence  correspondence;  that  is,  in  ambiguous  situations,  moving  elements 
will  tend  to  maintain  the  same  direction  of  motion  over  time. 

A  computational  model  of  correspondence  presented  by  Ullman  (1979)  assumes  inde¬ 
pendence  of  the  matching  elements.  Subsequent  studies  have  revealed  situations  in  which 
the  independence  assumption  appears  not  to  hold.  For  example,  the  perceived  motion  of  a 
feature  ran  be  influenced  by  the  motion  of  other  features  connected  to  it  along  a  contour 
(Hildreth.  1984;  Chen,  1985).  Ramachandran  and  Anstis  ( 1985)  created  a  display  in  which 
a  local  pattern  of  dots  whose  motion  was  two  way  ambiguous  was  repeated  in  a  largo  array. 
Each  local  subpattern  could  in  principle  be  perceived  as  moving  in  either  of  two  direction-', 
but  observers  always  perceived  the  array  of  patterns  as  all  moving  in  the  same  direction. 
1  he  correspondence  established  within  one  subpattern  of  the  display  could  influence  the 
correspondence  of  dots  in  neighboring  subpatterns. 

lo  summarize,  much  is  known  about  the  matching  element-  '  long  tango  mne 
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spondence,  and  the  rules  or  constraints  used  to  match  elements.  Many  recent  perceptual 
studies  were  motivated  by  computational  models  of  the  correspondence  process.  At  present, 
however,  there  are  no  computational  models  that  adequately  account  for  all  of  the  long- 
range  motion  phenomena  observed  in  perceptual  studies.  Recent  physiological  studies  that 
explored  the  response  of  MT  neurons  to  apparent  movement  stimuli  (Newsome,  Mikami 
and  Wurtz.  1982,  1986;  Mikama,  Newsome  and  Wurtz,  1986)  suggest  that  area  MT  might 
provide  some  of  (he  neural  substrate  for  the  interpretation  of  long-range  motion. 

The  Detection  of  Motion  Discontinuities 

If  two  adjacent  surfaces  undergo  different  motions,  a  discontinuity  generally  occurs 
in  the  optical  flow  or  velocity  field  along  their  boundary.  The  explicit  detection  of  motion 
discontinuities  allows  the  detection  and  localization  of  object  boundaries  in  the  scene.  Other 
cues  to  the  presence  of  boundaries  often  occur  as  well,  such  as  sharp  changes  in  stereo 
disparity  or  texture,  but  perceptual  studies  suggest  that  it  is  possible  to  detect  object 
boundaries  on  the  basis  of  motion  information  alone  (Anstis,  1970;  Regan  and  Spekreijse, 
1970;  Julesz,  1971)  and  to  use  the  relative  motions  in  the  vicinity  of  these  boundaries  to 
infer  the  relative  locations  of  surfaces  in  depth  (Kaplan,  1969;  Nakayama  and  Loomis,  1974; 
Mutch  and  Thompson,  1985). 

It  is  advantageous  to  detect  motion  discontinuities  as  early  as  possible,  for  two  reasons. 
First,  the  fast  detection  of  a  sudden  relative  movement  in  the  environment  can  serve  as 
an  early  warning  system,  alerting  the  observer  to  a  possible  prpy  or  predator,  or  to  the 
sudden  movement  of  an  object  toward  the  viewer.  It  is  essential  not  only  to  detect  the 
presence  of  movement,  but  also  to  identify  the  outline  of  the  object.  A  second  reason  for 
detecting  motion  discontinuities  early  is  that  they  facilitate  the  subsequent  measurement 
of  2-1)  motion  in  the  image.  It  was  noted  ealier  that  the  computation  of  a  velocity  field 
requires  the  integration  of  local  measurements  of  the  perpendicular  components  of  motion. 
Motion  measurements  should  only  be  combined  within  single  surfaces,  as  the  combination 
of  measurements  across  object  boundaries  will  generally  yield  errors  in  the  velocity  field.  If 
detected  early,  the  motion  discontinuities  can  define  regions  of  the  image  within  which  the 
local  motion  measurements  should  be  combined. 

With  regard  to  computational  schemes,  one  issue  that  arises  is  the  question  of  what 
stage  in  the  analysis  of  the  image  should  discontinuities  first  be  detected.  Three  alternatives 
present  themselves.  First,  motion  discontinuities  could  be  localized  prior  to  the  compulation 
of  t  he  full  velocity  field,  just  after  the  initial  measurements  of  the  perpendicular  components 
of  motion  in  the  image  (for  example,  Schunck  and  Horn,  1981;  Hildreth,  1984).  Schunck 
and  Horn  used  simple  heuristics  to  avoid  combining  motion  measurements  that  are  likely 
to  occur  on  surfaces  undergoing  different  motions.  Hildreth  presented  a  scheme  to  detect 
sudden  changes  in  the  perpendicular  components  of  motion,  which  uses  techniques  that 
were  previously  used  for  edge  detection  (Marr  and  Hildreth,  1980).  II.  Hulthoff  and  I'. 
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Poggio  (1986,  personal  communication)  use  the  binary  output  of  simple  correlation-like 
detectors,  signaling  motion  to  the  left  or  right,  to  localize  discontinuities  in  dense  random 
dot  patterns.  Surprisingly,  such  a  simple  measure  gives  a  fairly  accurate  assessment  of 
discontinuities,  at  least  for  random-dot  stimuli. 

A  second  possible  stage  at  which  boundaries  can  be  detected  is  after  the  velocity 
field  has  been  computed  explicitly  everywhere.  For  example,  Nakayama  and  Loomis  (1974) 
proposed  a  local  center-surround  operator  to  detect  boundaries  in  optical  flow  fields.  Similar 
ideas  are  incorporated  in  models  suggested  by  Clocksin  (1980),  and  Thompson,  Mutch  and 
Berzins  (1982,  1985;  Mutch  and  Thompson,  1985),  which  use  a  Laplacian  operator  applied 
to  components  of  the  optical  flow  field.  In  other  schemes  explored,  for  example,  by  Potter 
(1977)  and  Fennema  and  Thompson  (1979;  Thompson,  1980),  region-growing  techniques 
are  used  to  group  together  elements  of  similar  velocities. 

Finally,  the  velocity  field  and  its  discontinuities  could  be  computed  simultaneously.  In 
a  scheme  suggested  by  Wohn  (1984)  and  Waxman  (1986),  the  motion  segmentation  problem 
is  approached  by  detecting  “boundaries  of  analyticity”  at  which  an  approximation  of  the 
local  image  flow  by  second  order  polynomials  breaks  down.  The  boundaries  are  located 
within  the  process  that  models  the  local  motion  field.  Koch,  Marroquin  and  Yuille  (1986a) 
have  proposed  that  binary  line  processes,  first  introduced  in  the  solution  of  vision  problems 
by  Geman  and  Geman  (1984),  can  successfully  demarcate  motion  boundaries.  At  locations 
where  this  line  process  is  set,  an  unobservable  line  or  edge  is  postulated  to  interrupt  the 
otherwise  smooth  velocity  field,  segmenting  the  image  into  its  natural  components.  The 
appropriate  algorithm  can  be  formulated  as  an  energy  minimization  problem  that  maps 
naturally  into  simple  analog  networks  (Koch,  Marroquin  and  Yuille,  1986). 

A  detailed  neural  circuitry  for  the  detection  of  motion  discontinuities  by  the  housefly 
was  proposed  by  Reichardt,  Poggio  and  Hausen  (1983  Reichardt  and  Poggio,  1979).  Large 
field  binocular  “pool”  cells  summate  the  output  of  a  retinotopic  array  of  small  field  elemen¬ 
tary  movement  detectors  (EMD)  over  a  large  part  of  the  visual  field  of  the  two  compound 
eyes.  The  EMD  signal  movement  in  one  of  two  directions:  progressive,  i.e.  movement  from 
front  to  back,  and  regressive,  i.e.  movement  from  back  to  front.  The  pool  cells  inhibit  in 
turn,  via  a  silent  or  shunting  inhibition  (see  the  section  on  circuitry  and  biophysics),  the 
signals  provided  by  the  FMD,  irrespective  of  their  preferred  direction.  After  inhibition  of 
each  channel,  all  signals  from  the  EMD  feed  into  a  large  field  output  cell.  This  circuit  shows 
two  important  properties:  it  detects  relative  motion  of  a  moving  figure  superimposed  on 
a  stationary  background  of  the  same  texture  as  the  figure,  and  its  output,  the  optornotor 
response,  is  independent  of  the  size  of  moving  figure.  Motion  discontiuities  are  signaled  by 
significant  activity  in  the  output  cells.  The  model  agrees  well  with  behavioral  data  from 
the  fly.  Moreover,  elements  of  the  proposed  circuitry  can  be  identified  with  anatomically 
and  physiologically  characterized  cells  in  the  visual  system  of  the  fly  (Egelhaaf,  1985). 

Physiological  studies  have  revealed  center-surround  mechanisms  that  are  organized 
antagonistically  for  direction  of  motion  in  many  vertebrate  species  (for  example.  Sterling 
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and  Wickelgren,  1969;  Collett,  1972;  Bridgeman,  1972;  Frc*t,  1978;  Frost,  Scilley  and 
Wong,  1981;  Frost  and  Nakayama,  1983).  Motion-sensitive  cells  with  this  organization 
have  been  found  recently  in  area  MT  of  the  Owl  monkey  (Miezin,  McGuinness  and  Allman, 
1982;  Allman,  Miezin  and  McGuinness,  1985)  and  in  striate  cortex  of  the  cat  (Orban  et 
at. ,  1986).  The  existence  of  center-surround  relative  motion  detection  mechanisms  across 
such  a  range  of  species  suggests  that  a  similar  strategy  may  be  utilized  in  the  underlying 
computations.  Richards  and  Lieberman  (1982)  show  in  psychophysical  studies  that  some 
viewers  are  “blind”  to  shearing  motions,  and  suggest  that  the  neural  substrate  for  detecting 
such  discontinuous  motions  may  be  independent  from  mechanisms  detecting  other  motion 
boundaries. 

Psychophysical  studies  of  motion  discontinuities  have  mainly  used  dynamic  random  dot 
patterns,  in  which  only  motion  cues  signal  the  presence  of  boundaries.  Braddick’s  (1974, 
1980)  studies  revealed  a  limit  on  the  spatial  and  temporal  displacements  required  to  per¬ 
ceive  coherent  motion  in  dense  random  dot  patterns,  and  showed  that  it  was  possible  to 
detect  a  boundary  between  coherent  and  incoherent  fields  of  motion.  Experiments  by  Baker 
and  Braddick  (1982a)  and  van  Doom  and  Koenderink  (1982,  1983)  suggest  that  the  detec¬ 
tion  of  discontinuities  is  not  based  on  a  computation  that  explicitly  measures  only  relative 
movement;  rather,  an  absolute  measurement  of  motion  takes  place  first,  followed  by  a  pro 
cess  that  compares  nearby  motions  to  locate  discontinuities.  Baker  and  Braddick  (1982b) 
showed  that  the  ability  to  discriminate  the  orientation  of  a  patch  that  moves  against  an 
uncorrelated  background  varies  little  with  dot  density  and  increases  with  the  patch  size  (see 
also  Chang  and  Julesz,  1983).  In  general,  the  size  of  a  patch  of  moving  dots  that  can  be  dis¬ 
criminated  against  a  differentially  moving  background  increases  with  larger  displacements 
of  the  dots  between  frames  (Hildreth,  1984).  This  phenomenon  may  reflect  the  limitations 
of  multiple  spatial  frequency  channels  involved  in  the  early  detection  of  motion.  Other  per¬ 
ceptual  studies  have  shown  that  spatial  frequency  plays  a  role  in  determining  the  maximum 
displacements  that  allow  the  perception  of  coherent  motion  in  random  dot  patterns  (Chang 
and  Julesz,  1983;  Nakayama  and  Silverman,  1984b). 

It  is  important  to  draw  a  distinction  between  the  ability  to  detect  differences  in  motion, 
and  the  ability  to  localize  a  boundary  between  surfaces  undergoing  different  motions.  For 
example,  if  two  adjacent  fields  are  undergoing  motion  in  the  same  direction,  a  5%  difference 
in  speed  is  sufficient  to  detect  relative  movement  (McKee,  1981;  Nakayama,  1981).  To 
localize  a  boundary,  however,  requires  much  larger  differences  in  speed,  between  50  —  60% 
(van  Doom  and  Koenderink,  1982,  1983;  Hildreth,  1984).  If  two  adjacent  surfaces  undergo 
motions  with  similar  speeds  but  different  directions,  then  an  angular  change  in  direction  of 
at  least  20°  is  required  to  localize  the  position  of  the  boundary  (Hildreth,  1984). 

Experimental  studies  have  provided  much  insight  into  the  nature  of  the  mechanisms 
that  underlie  the  detection  of  motion  discontinuities  in  biological  systems.  Many  funda¬ 
mental  questions  still  remain,  however.  Perhaps  the  most  basic  open  question  concerns  at 
what  stage  in  the  analysis  of  motion  the  discontinuities  are  first  detected.  It  is  not  known. 
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for  example,  what  representation  of  motion  forms  the  input  to  the  center-surround  mecha- 
nisms  observed  in  area  MT  by  Allman,  Miezin  and  McGuinness  (1985).  These  mechanism- 
may  operate  directly  on  the  perpendicular  components  of  motion,  or  they  may  operate  on 
the  real  2-D  directions  of  image  motion.  Psychophysical  studies  have  not  yet  addressed 
this  issue  directly.  Furthermore,  while  physiological  studies  reveal  that  some  sort  of  center- 
surround  mechanisms  are  involved  in  the  detection  of  relative  movement,  little  is  known 
about  what  these  mechanisms  really  compute  and  how  they  compute  this  information.  Fur¬ 
ther  computational  studies  are  needed  to  examine  possible  algorithms  for  detecting  motion 
boundaries  that  may  utilize  these  center-surround  mechanisms. 
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THE  RECOVERY  OF  THREE-DIMENSIONAL  STRUCTURE  FROM 
MOTION 
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When  an  object  moves  in  space,  the  motions  of  individual  points  on  the  object  differ 
in  a  way  that  conveys  information  about  its  3-D  structure,  as  illustrated  in  Figure  5a.  The 
directions  of  motion  in  this  case  are  all  horizontal,  but  the  speed  of  movement  varies  in 
a  way  that  depends  on  the  structure  of  the  object.  Using  wireframe  objects  such  as  that 
shown  in  Figure  5a,  Wallach  and  O’Connell  (1953)  showed  that  the  human  visual  system 
can  derive  the  correct  3-D  structure  of  moving  objects  from  their  changing  2-D  projection 
alone.  Other  perceptual  studies  also  demonstrated  this  remarkable  ability  (for  example, 
Green,  1961;  Braunstein,  1962,  1976;  Johansson,  1973,  1975;  Rogers  and  Graham,  1979: 
Ullinan,  1979;  Cutting,  1982:  Cutting  and  Proffitt,  1982).  Relative  motion  in  the  image 
is  also  created  by  movement  of  the  observer  relative  to  the  environment,  and  can  be  used 
to  infer  observer  motion  from  the  changing  image  (Gibson,  1950;  Lee  and  Aronson,  1974; 
Johansson,  1971;  Lee,  1980). 

Theoretically,  the  two  problems  of  (1)  recovering  the  3-D  structure  and  movement  of 
objects  in  the  environment  and  (2)  recovering  the  3-D  motion  of  the  observer  from  the 
changing  image,  are  closely  related.  The  main  difficulty  faced  by  both  is  that  infinitely 
many  combinations  of  3-D  structure  and  motion  could  give  rise  to  any  particular  2-D 
image.  To  resolve  this  inherent  ambiguity,  it  is  necessary  to  impose  additional  constraint 
that  allows  most  3-D  interpretations  to  be  ruled  out,  leaving  one  that  is  most  plausible 
from  a  physical  standpoint.  Computational  studies  have  used  the  rigidity  assumption  to 
derive  a  unique  3-D  structure  and  motion;  they  assume  that  if  it  is  possible  to  interpret 
the  changing  2-D  image  as  the  projection  of  a  rigid  3-D  object  in  motion,  then  such  an 
interpretation  should  be  chosen  (for  example,  Ullman,  1979,  1983;  Clocksin,  1980;  Prazdny, 
1980,  1983;  Longuet-lliggins,  1981;  Longuet-Higgins  and  Prazdny,  1981;  Tsai  and  Huang, 
1981;  Hoffman  and  Flinchbaugh,  1982;  Bobick,  1983;  Mitiche,  1984,  1986;  Mitiche,  Seida 
and  Aggarwal,  1985;  Waxman  and  Ullman.  1985).  When  the  rigidity  assumption  is  used  in 
this  way.  the  recovery  of  structure  from  motion  requires  the  computation  of  the  rigid  3-D 
object  that  would  project  onto  a  given  2-D  image.  The  rigidity  assumption  was  suggested 
l>v  perceptual  studies  that  described  a  tendency  for  the  human  visual  system  to  choose  a 
rigid  interpretation  of  moving  elements  (Wallach  and  O'Connell,  1953:  Gibson  and  Gibson. 
1957:  Green,  1961;  Jansson  and  Johansson.  1973;  Johansson,  1975.  1977). 

Computational  studies  have  shown  that  the  rigidity  assumption  can  be  used  to  derive 
a  unique  3-D  structure  from  the  changing  2-D  image.  Furthermore,  this  unique  3-D  in¬ 
terpretation  can  be  derived  by  integrating  image  information  only  over  a  limited  extent  in 
space  and  in  time.  For  example,  suppose  that  a  rigid  object  in  motion  is  projected  onto 
the  image  plane  by  using  orthographic  projection.  Three  distinct  views  of  four  points  on 
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the  moving  object  are  sufficient  to  compute  a  unique  rigid  3-D  structure  for  the  points 
(Ullman,  1979).  In  general,  if  only  two  views  of  the  moving  points  are  considered  or  fewer 
points  are  observed,  there  are  multiple  rigid  3-D  structures  consistent  with  the  changing 

2- D  projection.  If  a  perspective  projection  of  objects  onto  the  image  is  used  instead,  then 
two  distinct  views  of  seven  or  eight  points  in  motion  are  usually  sufficient  to  compute  a 
unique  3-D  structure  for  the  points  (Longuet-Higgins,  1981;  Tsai  and  Huang,  1981).  If 
the  instantaneous  velocity  of  movement  in  the  image  is  known  at  discrete  points,  then 
under  perspective  projection,  the  position  and  velocity  at  five  points  may  be  sufficient  to 
derive  a  unique  structure  (Prazdny,  1980;  Roach  and  Aggarwal,  1980).  Longuet-Higgins 
and  Prazdny  (1981)  originally  showed  that  if  the  continuous  velocity  field  is  known  every¬ 
where  within  a  region  of  the  image,  then  the  velocity  field  together  with  its  first  and  second 
spatial  derivatives  at  a  point  is  consistent  with  at  most  three  possible  surface  orientations 
at  that  point.  Waxman,  Kamgar-Parsi  and  Subbarao  (see  VVaxman,  1986)  have  recently 
shown  that  a  unique  solution  can  usually  be  determined  in  this  case.  Finally,  for  the  case 
of  orthographic  projection,  3-D  structure  can  be  recovered  uniquely  if  both  the  velocity 
and  acceleration  fields  are  known  within  a  region  (Hoffman,  1982).  Additional  theoretical 
results  have  been  obtained  for  classes  of  restricted  motion,  such  as  planar  surfaces  in  motion 
(Hay,  1966;  Koenderink  and  van  Doom,  1976;  Buxton  et  ai,  1984;  Longuet-Higgins,  1984; 
Murray  and  Buxton,  1984;  Kanatani,  1985;  Waxman  and  Ullman,  1985;  Ullman,  1985; 
Negahdaripour  and  Horn,  1985;  Subbarao  and  Waxman,  1985),  pure  translatory  motion  of 
the  observer  (Clocksin,  1980;  Lawton,  1983;  Jerian  and  Jain,  1984),  planar  or  fixed  axis  ro¬ 
tation  (HofTman  and  Flinchbaugh,  1982;  Webb  and  Aggarwal,  1981;  Bobick,  1983;  Bennett 
and  Hoffman,  1985;  Sugie  and  Inagaki,  1984),  translation  perpendicular  to  the  rotation  axis 
(Longuet-Higgins,  1983),  and  motion  of  quadratic  surfaces  (Waxman  and  Ullman,  1985; 
Waxman  and  Wohn,  1985).  A  review  of  early  theoretical  results  regarding  the  recovery  of 
structure  from  motion  can  be  found  in  Ullman  (1983). 

The  theoretical  results  summarized  above  are  important  for  the  study  of  the  recovery 
of  structure  from  motion  in  biological  vision  systems,  for  at  least  two  reasons.  First,  they 
show  that  by  using  the  rigidity  assumption,  a  unique  structure  can  be  recovered  from 
motion  information  alone.  It  is  not  necessary  to  make  further  physical  assumptions,  in 
order  to  obtain  a  unique  solution.  Second,  these  results  show  that  it  is  possible  to  recover 

3- D  structure  by  integrating  image  information  over  a  small  extent  in  space  and  in  time. 
This  second  observation  could  bear  on  the  neural  mechanisms  that  compute  structure  from 
motion;  in  principle,  they  need  only  integrate  motion  information  over  a  limited  area  of  the 
visual  field  and  a  limited  extent  in  time. 

The  above  computational  studies  of  the  recovery  of  structure  from  motion  also  provide 
algorithms  for  deriving  the  structure  of  moving  objects.  Typically,  measurements  of  the 
positions  or  velocities  of  image  features  give  rise  to  a  set  of  mathematical  equations  whose 
solution  represents  the  desired  3-  D  structure.  The  algorithms  generally  derive  this  struct ure 
from  motion  information  extracted  over  a  limited  area  of  the  image  and  a  limited  extent  in 
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time.  Testing  of  these  algorithms  reveals  that  although  this  strategy  is  possible  in  theory, 
it  is  not  reliable  in  practice.  A  small  amount  of  error  in  the  image  measurements  can  lead 
to  very  different  (and  often  incorrect)  3-D  structures.  This  behavior  is  due  in  part  to  the 
observation  that  over  a  small  extent  in  space  and  time,  very  different  objects  can  induce 
almost  identical  patterns  of  motion  in  the  image  (Ullman,  1983,  1984). 

This  sensitivity  to  error  inherent  in  algorithms  that  integrate  motion  information  only 
over  a  small  extent  in  space  and  time  suggests  that  a  robust  scheme  for  deriving  struc¬ 
ture  should  use  image  information  that  is  more  extended  in  space  or  time  or  both.  This 
conclusion  is  supported  in  recent  computational  studies  (Bruss  and  Horn,  1983;  Lawton. 
1983;  Ullman,  1984;  Adiv,  1985;  Negahdaripour  and  Horn,  1985,  Waxman  and  Wohn,  1985; 
Wohn  and  Waxman,  1985).  Lawton  (1983)  showed  that  recovery  of  the  translatory  motion 
of  an  observer  could  be  coupled  with  the  solution  to  the  motion  correspondence  problem 
over  an  extended  region  of  the  image,  to  yield  a  robust  solution.  Adiv  (1985)  presented 
an  algorithm  for  recovering  the  motion  parameters  for  several  moving  objects,  which  as¬ 
sumes  that  object  surfaces  are  piecewise  planar.  The  extraction  of  the  motion  parameters 
uses  a  least-squares  approach  that  minimizes  the  deviation  between  the  measured  flow  field 
(at  a  large  number  of  points)  and  that  predicted  from  the  estimated  motion  and  structure 
(Bruss  and  Horn,  1983).  Negahdaripour  and  Horn  (1985)  also  addressed  the  recovery  of 
the  motion  of  an  observer  relative  to  a  stationary  planar  surface,  and  showed  that  a  robust 
recovery  of  the  observer  motion  and  the  orientation  of  the  plane  is  possible  when  dense 
measurements  of  the  spatial  and  temporal  derivatives  of  image  brightness  are  integrated 
over  a  large  region  of  the  changing  image.  Thus,  consideration  of  motion  information  that 
is  more  extended  in  space  can  lead  to  a  stable  recovery  of  structure.  The  study  by  Ullman 
(1984),  elaborated  below,  demonstrated  that  a  robust  recovery  of  structure  is  also  possible 
when  motion  information  is  integrated  over  an  extended  period  of  time.  The  extension  in 
time  can  be  achieved,  for  example,  by  considering  a  large  number  of  discrete  frames  or  by- 
observing  continuous  motion  over  a  significant  temporal  extent. 

With  regard  to  the  human  visual  system,  the  dependence  of  perceived  structure  on  the 
spatial  and  temporal  extent  of  the  viewed  motion  has  not  yet  been  studied  systematically, 
but  the  following  informal  observations  have  been  made.  Regarding  spatial  extent,  two  or 
three  points  undergoing  relative  motion  are  sufficient  to  elicit  a  perception  of  3  D  structure 
(Borjesson  and  von  Hofsten,  1973;  Johansson,  1975),  although  theoretically  the  recovery 
of  structure  is  less  constrained  for  two  points  in  motion,  and  perceptually  the  sensation  of 
structure  is  weaker.  An  increase  in  the  number  of  moving  elements  in  view  appears  to  have 
little  effect  on  the  quality  of  perceived  structure  (for  example,  Petersik,  1980).  Regarding 
the  temporal  extent  of  viewed  motion,  Johansson  (1975)  showed  that  a  brief  observation  of 
patterns  of  moving  lights  generated  by  human  figures  moving  in  the  dark  (commonly  referred 
to  as  biological  motion  displays)  can  lead  to  a  perception  of  the  3  D  motion  and  structure 
of  the  figures.  Other  perceptual  studies  indicate  that  the  human  visual  system  requires 
an  extended  time  period  to  reach  an  accurate  perception  of  3-D  structure  (Wallach  and 
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O’Connell,  1953;  White  and  Mueser,  1960;  Green,  1961;  Doner,  Lappin  and  Perfetto,  19*1: 
Inada  et  ai,  1986),  A  brief  observation  of  a  moving  pattern  sometimes  yields  an  impression 
of  structure  that  is  “flatter”  than  the  true  structure  of  the  moving  object.  Thus,  the  human 
visual  system  is  capable  of  deriving  some  sense  of  structure  from  motion  information  that 
is  integrated  over  a  small  extent  in  space  and  time.  An  accurate  perception  of  structure 
may,  however,  require  a  more  extended  viewing  period. 

Mo6t  methods  compute  a  3-D  structure  from  motion  only  when  the  changing  image 
can  be  interpreted  as  the  projection  of  a  rigid  object  in  motion.  They  otherwise  yield  no 
interpretation  of  structure  or  yield  a  solution  that  is  incorrect  or  unstable.  Algorithms  that 
are  exceptions  to  this  can  interpret  only  restricted  classes  of  nonrigid  motions  (Bennett 
and  Hoffman,  1985;  Hoffman  and  Flinchbaugh,  1982;  Koenderink  and  van  Doom.  1986). 
The  human  visual  system,  however,  can  derive  some  sense  of  structure  for  a  wide  range 
of  nonrigid  motions,  including  stretching,  bending  and  more  complex  types  of  deformation 
(Johansson,  1964;  Jansson  and  Johansson,  1973;  Todd,  1982,  1984).  Furthermore,  displays 
of  rigid  objects  in  motion  sometimes  give  rise  to  the  perception  of  somewhat  distorting 
objects  (Wallach,  Weisz  and  Adams,  1956;  White  and  Mueser.  1960;  Green,  1961:  Braun 
stein,  1962;  Sperling  et  al. ,  1983;  Braunstein  and  Andersen,  1984;  Hildreth,  1984;  Adelson. 
1985).  These  observations  suggest  that  while  the  human  visual  system  tends  to  choose  rigid 
interpretations  of  a  changing  image,  it  probably  does  not  use  the  rigidity  assumption  in  the 
strict  way  that  previous  computational  studies  have  suggested. 

Ullman  (1984)  proposed  a  more  flexible  method  for  deriving  structure  from  motion  that 
interprets  both  rigid  and  nonrigid  motion.  Referred  to  as  the  incremental  rigidity  scheme. 
this  algorithm  uses  the  rigidity  assumption  in  a  different  wav  from  previous  studies.  It 
maintains  an  internal  model  of  the  structure  of  a  moving  object  that  consists  of  the  estimated 
I  D  coordinates  of  points  on  the  object.  The  model  is  continually  updated  as  new  positions 
of  image  features  are  considered.  Initially,  the  object  is  assumed  to  be  flat,  if  no  other  cues 
to  3-D  structure  are  present.  Otherwise,  its  initial  structure  may  be  determined  by  other 
cues  available,  from  stereopsis,  shading,  texture,  or  perspective.  As  each  new  view  of  the 
moving  object  appears,  the  algorithm  computes  a  new  set  of  3  1)  coordinates  for  points 
on  the  object  that  maximizes  the  rigidity  in  the  transformation  from  the  current  model 
to  the  new  positions.  This  is  achieved  by  minimizimg  the  change  in  the  3  1)  distances 
between  points  in  the  model.  Thus  the  algorithm  interprets  the  changing  2  1)  image  us 
the  projection  of  a  moving  3-D  object  that  changes  as  little  as  possible  from  one  moment 
to  the  next.  Through  a  process  of  repeatedly  considering  new  views  of  objects  in  motion 
and  updating  the  current  model  of  their  structure,  the  algorithm  builds  up  and  maintains 
a  3  D  model  of  the  objects.  If  objects  deform  over  time,  the  3  1)  model  computed  by  the 
algorithm  also  changes  over  time.  Other  models  have  been  proposed  that  impose  rigidiu 
by  requiring  that  the  3-D  distances  between  points  in  space  change  very  little  from  one 
moment  to  the  next  (for  example,  Mitiche,  1984,  19*6:  Mitirhe,  Seida  and  Aggarwal.  198')). 
although  these  models  do  not  build  up  a  3-D  model  incrementally  as  in  I’llman's  proposed 
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scheme. 

The  method  proposed  by  Ullman  (1984)  was  motivated  in  part  by  the  limitations  of 
previous  computer  algorithms  and  in  part  by  knowledge  of  the  human  visual  system.  The 
method  has  overcome  limitations  of  previous  computational  studies  in  two  ways.  First,  it 
provides  a  reliable  recovery  of  structure  in  the  presence  of  error  in  the  image  measurements, 
by  integrating  image  information  over  an  extended  time  period.  Second,  it  allows  the 
interpretation  of  nonrigid  motions.  These  are  essential  qualities  for  any  method  that  is 
proposed  as  a  viable  model  for  the  recovery  of  structure  from  motion  by  the  human  visual 
system.  This  method  also  has  other  attributes  that  are  consistent  with  human  percept  ual 
behavior:  (1)  it  sometimes  yields  a  nonrigid  interpretation  of  rigid  structures  in  motion, 
(2)  a  brief  viewing  time  results  in  a  structure  that  is  “flatter”  than  the  true  structure  of 
the  object,  (3)  it  allows  a  3-D  interpretation  of  scenes  containing  as  few  as  two  points  in 
motion  (Borjesson  and  von  Hofsten,  1973;  Johansson,  1975),  and  (4)  it  provides  a  natural 
means  for  integrating  multiple  sources  of  3-D  information. 

A  recent  computational  study  by  Grzywacz  and  Hildreth  (1985)  has  extended  Ullman ’s 
incremental  rigidity  scheme,  presenting  a  formulation  of  the  algorithm  that  makes  direct 
use  of  instantaneous  velocity  information  over  an  extended  time,  and  showing  how  the 
algorithm  can  be  modified  to  use  perspective  projection  of  the  scene  onto  the  image.  With 
regard  to  the  use  of  velocities,  previous  studies  had  suggested  that  the  recovery  of  3-D 
structure  from  velocity  information  at  a  single  moment  is  inherently  unstable  (Prazdny. 
I960;  Ullman,  1983).  Through  computer  simulations  and  a  theoretical  analysis,  Grzywacz 
and  Hildreth  showed  that  the  integration  of  velocity  information  over  an  extended  time  does 
not  overcome  this  problem  of  instability.  The  velocity  based  formulation  of  the  incremental 
rigidity  scheme  does  not  yield  a  robust  computation  of  structure  over  an  extended  time; 
rather,  the  solution  oscillates  between  good  and  poor  estimates  of  the  3-D  structure  of  a 
moving  object.  More  generally,  if  discrete  views  of  moving  elements  are  used  instead,  the 
incremental  rigidity  scheme  performs  best  when  the  spatial  changes  between  views  are  large. 
For  example,  if  an  object  is  rotating,  the  algorithm  computes  a  better  3-D  structure  for  the 
object  if  larger  angular  rotations  between  discrete  frames  are  considered. 

With  regard  to  the  human  visual  system,  it  is  unlikely  that  discrete  movie-like  “snap¬ 
shots”  form  a  direct  input  to  the  recovery  of  3-D  structure  from  motion.  Second,  if  a 
short-range  motion  measurement  system  exists  and  provides  essentially  instantaneous  mea¬ 
surements  of  movement  in  the  changing  image,  these  measurements  should  be  used  in  some 
way  to  interpret  the  3-D  structure  of  the  scene.  These  short-range  measurements  may, 
however,  form  the  input  to  a  longer-range  tracking  operation  that  integrates  image  motion 
information  over  a  more  extended  time  for  the  accurate  recovery  of  3-D  structure.  In  any 
case,  the  short-range  measurements  can  also  be  used  to  identify  motion  discontinuities, 
which  are  likely  to  indicate  the  locations  of  object  boundaries  in  the  scene.  Knowledge  of 
object  boundaries  can  improve  the  overall  recovery  of  structure  from  motion. 

This  discussion  of  the  structure-from-motion  problem  illustrates  a  number  of  impor- 
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tant  points  that  often  arise  in  the  computational  study  of  other  problems  in  the  early 
stages  of  vision.  First,  a  single  solution  to  the  problem  cannot  be  obtained  from  informa¬ 
tion  in  the  image  alone;  additional  constraint  is  required.  Second,  theoretical  studies  can 
be  used  to  show  that  a  general  physical  assumption  such  as  rigidity  is  sufficient  to  solve 
the  structure-from-motion  problem  uniquely.  Third,  an  assumption  such  as  rigidity  can  be 
incorporated  in  many  ways  into  an  algorithm  to  recover  structure.  The  development  of  a 
reliable  algorithm  requires  a  cycling  between  computer  implementation,  testing,  and  refine¬ 
ment.  Finally,  perceptual  studies  can  suggest  and  test  particular  assumptions  and  reveal 
aspects  of  the  algorithm  used  by  the  human  visual  system  for  solving  a  given  problem.  It 
is  typical  of  computational  studies  that  the  initial  methods  proposed  for  solving  a  problem 
only  loosely  consider  the  detailed  observations  of  biological  systems.  These  first  studies  un¬ 
cover  useful  aspects  of  the  problems,  however.  Later  studies  then  combine  this  knowledge 
of  the  problem  with  observations  of  biological  systems  to  derive  models  that  more  closely 
reflect  the  computations  carried  out  in  biological  systems. 


Physiological  Studies  of  the  Recovery  of  Structure  from  Motion 


Physiological  studies  have  uncovered  neurons  in  higher  cortical  areas  that  are  sensitive 
to  properties  of  the  motion  field  that  may  be  relevant  to  the  recovery  of  the  3  D  structure 
and  motion  of  surfaces  in  the  environment,  or  to  the  recovery  of  the  motion  of  the  observer 
relative  to  the  scene.  Many  studies  have  revealed  neurons  sensitive  to  uniform  expansion 
or  contraction  of  the  visual  field,  a  property  that  is  correlated  either  with  translation  of 
the  observer  forward  or  backward,  or  equivalently,  motion  of  an  object  toward  or  away 
from  the  observer.  Such  neurons  have  been  found,  for  example,  in  the  posterior  parietal 
cortex  of  the  monkey  (Motter  and  Mountcastle,  1981;  Andersen,  1986).  Other  neurons  have 
been  found  that  are  sensitive  to  global  rotations  in  the  visual  field  (Andersen,  1986:  Sakata 
et  at,  1985).  All  of  these  neurons  have  large  receptive  fields,  so  they  probably  lack  the 
spatial  sensitivity  required  to  derive  the  detailed  shape  of  an  object  surface  from  relative 
motion.  In  the  human  visual  system,  the  accurate  recovery  of  object  shape  from  motion 
may  be  an  ability  that  is  restricted  to  the  central  region  of  the  eye;  the  ability  to  interpret 
2-D  structure-from-motion  displays  appears  to  degrade  rapidly  as  one  moves  away  from 
the  fovea  (S.  Ullman,  personal  communication).  Siegel  and  Andersen  (1986)  showed  that 
motion  processing  in  area  MT  is  critical  to  the  recovery  of  structure  from  motion. 

The  neurons  sensitive  to  relative  movement  that  were  discussed  in  the  context  of  mo¬ 
tion  discontinuities  may  also  contribute  to  the  recovery  of  3-D  structure.  Certainly  the 
detection  and  localization  of  object  boundaries  is  essential  to  the  construction  of  a  3  1) 
representation  of  surfaces  in  the  scene.  Mechanisms  such  as  the  “convexity"  detector  sug¬ 
gested  by  Nakayama  and  Loomis  (1974)  may  also  derive  information  about  the  relative 
depths  of  surfaces  on  either  side  of  a  motion  boundary.  The  computational  study  bv  Mutch 
and  Thompson  (1985)  also  addressed  this  issue. 
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Regan  and  Beverley  (1979,  1983)  have  hypothesized  the  existence  of  ‘changing-size’ 
detectors  (analogous  to  detectors  of  uniform  expansion  or  contraction  in  the  visual  field) 
based  on  psychophysical  evidence  from  adaptation  studies.  They  also  suggested  that  the 
changing-size  detectors  may  be  distinct  from  neural  mechanisms  signaling  motion  in  depth 
(Beverley  and  Regan,  1979).  Neurons  exist  in  area  18  of  the  cat  visual  cortex  (for  example, 
Cvnader  and  Regan,  1978,  1982)  and  area  Vl  of  the  primate  visual  cortex  (Poggio  and 
Talbot,  1981)  that  appear  to  be  selective  for  direction  of  movement  in  depth.  These  studies 
of  cells  responsive  to  movement  in  depth  used  binocularly  viewed  moving  bars,  however, 
so  they  may  address  the  interaction  between  binocular  stereopsis  and  motion  measurement 
for  the  recovery  of  movement  in  space,  rather  than  the  recovery  of  structure  from  motion 
alone. 
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CONCLUDING  REMARKS 

In  this  review  we  have  tried  to  integrate  studies  from  computation,  psychophysics, 
physiology  and  biophysics  into  a  computational  framework.  The  interaction  between  these 
different  approaches  promises  to  be  fruitful  in  furthering  our  understanding  of  motion  anal¬ 
ysis  in  biological  vision  systems,  because  the  various  perspectives  each  provide  valuable  and 
different  insight  into  how  vision  systems  analyze  motion  information. 

Perceptual  studies,  for  example,  help  to  define  the  problems  in  motion  analysis  that  are 
solved,  and  reveal  the  quantitative  ability  with  which  the  human  visual  system  can  solve 
these  problems.  We  have  seen  that  many  problems  in  motion  analysis  do  not  have  a  unique 
solution,  and  additional  constraint  must  be  imposed  to  solve  them.  There  are  often  different 
choices  for  the  assumptions  that  could  be  embodied  in  the  underlying  computations,  which 
critical  perceptual  experiments  can  attempt  to  distinguish.  There  are  also  many  algorithms 
that  could  solve  a  given  problem,  and  different  algorithms  might  fail  in  different  ways. 
Again,  critical  perceptual  experiments  can  be  designed  to  determine  whether  the  human 
visual  system  fails  in  the  same  way.  It  is  often  the  case  that  perceptual  studies  provide 
initial  hints  about  the  strategies  used  in  the  underlying  computations. 

Studies  from  physiology  and  biophysics  can  reveal  what  parts  of  the  visual  system  are 
involved  in  a  particular  computation,  and  what  the  elementary  operations  are  that  neurons 
use  in  processing  motion  information.  Properties  of  the  underlying  hardware  also  constrain 
the  nature  of  the  algorithms  and  representations  that  are  used  in  motion  computations.  De¬ 
tailed  computer  models  of  neuronal  networks  subserving  motion  measurement  have  helped 
to  focus  further  experimental  questions  regarding  physiological  and  biophysical  behavior. 
Finally,  physiological  methods  can  help  eliminate  ambiguities  in  perceptual  studies.  Be¬ 
cause  the  primate  visual  system  may  have  evolved  a  variety  of  different  algorithms  to  cope 
with  a  particular  problem,  a  psychophysical  paradigm  may  be  unable  to  distinguish  between 
these  different  algorithms,  while  single-cell  recordings  may  do  so. 

Computational  studies  help  to  focus  questions  for  perceptual  studies  about  the 
sumptions,  representations,  and  algorithms  used  by  the  human  visual  system  to  analyze 
motion.  Implementations  of  proposed  algorithms  have  provided  powerful  predictive  tools 
for  making  hypotheses  about  what  the  behavior  of  the  system  ought  to  be  if  it  is  per¬ 
forming  motion  computations  in  particular  ways.  In  the  case  of  physiological  studies,  by 
elucidating  the  problems  that  need  to  be  solved  in  motion  analysis,  computational  studies 
can  aid  the  initial  exploration  of  the  function  of  neurons  in  motion- sensitive  areas  in  the 
visual  pathway.  By  elucidating  possible  methods  by  which  computations  can  be  performed, 
computational  studies  can  help  to  refine  our  understanding  of  how  neurons  function  and  by 
what  mechanisms. 
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